count of daily users in PostgresSQL where count zero is also added

count of daily users in PostgresSQL where count zero is also added - sql

I am using the below query to get count of users created
SELECT
DATE_TRUNC('day', created),
COUNT(*)
FROM users
GROUP BY DATE_TRUNC('day', created)
but the dates where count is 0 are not showing up.
i know about generate_series but i couldn't get it to work with it.
select date_trunc('day',created_at)as day,
count(id) as signup from users
where created_at between '2022-1-11' and now()
group by 1
order by 1
I have tried re writing as above, but still doesn't work
How can get entry with where count is zero?
I tried using generate_series() but could not get the query correct.

A left join with a generate_series call might be what you're looking for:
SELECT j.d, count(u.*)
FROM generate_series('2022-11-01'::date,CURRENT_DATE,interval '1 day') j (d)
LEFT JOIN users u ON u.created = j.d
GROUP BY 1;
Demo: db<>fiddle

Related

Filling in empty dates

This query returns the number of alarms created by day between a specific date range.
SELECT CAST(created_at AS DATE) AS date, SUM(1) AS count
FROM ew_alarms
LEFT JOIN site ON site.id = ew_alarms.site_id
AND ew_alarms.created_at BETWEEN '12/22/2020' AND '01/22/2021' AND (CAST(EXTRACT(HOUR FROM ew_alarms.created_at) AS INT) BETWEEN 0 AND 23.99)
GROUP BY CAST(created_at AS DATE)
ORDER BY date DESC
Result: screenshot
What the best way to fill in the missing dates (1/16, 1/17, 1/18, etc)? Due to no alarms created on those days these results throw off the daily average I'm ultimately trying to achieve.
Would it be a generate_series query?

Yes, use generate_series(). I would suggest:
SELECT gs.date, COUNT(s.site_id) AS count
FROM GENERATE_SERIES('2020-12-22'::date, '2021-01-22'::date, INTERVAL '1 DAY') gs(dte) LEFT JOIN
ew_alarms a
ON ew.created_at >= gs.dte AND
ew.created_at < gs.dte + INTERVAL '1 DAY' LEFT JOIN
site s
ON s.id = a.site_id
GROUP BY gs.dte
ORDER BY date DESC;
I don't know what the hour comparison is supposed to be doing. The hour is always going to be between 0 and 23, so I removed that logic.
Note: Presumably, you want to count something from either site or ew_alarms. That is expected with LEFT JOINs so 0 can be returned.

How to Average Number of Chats per Day on LEFT JOIN table in Snowflake SQL?

In Snowflake SQL dictation, how do I average the number of video chats per day using a field from a table I left joined to the entire query?
I'm thinking I have to do a SUM function to total the number of video chats and then aggregate by # of video chats for each date and then divide by 30 days (the rolling date range I specified throughout my entire query).
Any help would be appreciated as deadlines are approaching. Thank you.
SELECT DISTINCT
t1."pid",
IFNULL(t2."VideoChats",0),
t3."SFUser",
t3."TotalProviders",
t4."dimaccount.practice_specialty",
t5."Account: CMRR",
t6."CreatedDate",
t7."stg_sf_case.Date_Time_Resolved__c",
t8."stg_sf_case.Closed_Date",
t9."pid"
FROM (SELECT "pid"
FROM "EDW_PROD"."PUBLIC"."STG_MYSQL_PROVIDERMODULES" AS a
WHERE a."active"
AND a."status" = 'PURCHASED'
AND a."module_id" = '14'
GROUP BY a."pid"
) t1
LEFT JOIN (SELECT "started_at",
"pid",
COUNT(*) AS "VideoChats"
FROM "EDW_PROD"."PUBLIC"."STG_MYSQL_VIDEOCHATROOM" AS b
LEFT JOIN "EDW_PROD"."PUBLIC"."DIMACCOUNT" AS dimaccount
ON b."pid" = dimaccount."PID"
WHERE b."started_at" >= DATE_TRUNC('month', CURRENT_DATE())
AND b."started_at" < DATEADD('month', 1, DATE_TRUNC('month', CURRENT_DATE()))
AND dimaccount."CurrentRow" = 'Y'
GROUP BY b."pid", b."started_at"
) t2 ON t1."pid" = t2."pid"

For a rolling average you probably want to use a window function. Something along these lines.
SELECT AVG(VideoChats) over (partition by pid order by started_at rows between 30 preceding and current row) as AvgVideoChats
--I saw a post about AVG not allowing a sliding window, so you may have to do this instead
SELECT SUM(VideoChats) over (partition by pid order by started_at rows between 30 preceding and current row) / 30. as AvgVideoChats
You may need to do this in a wrapper around your t2 query and adjust your date filters so that there are values available for averaging, but I'm not quite clear enough on what your query is doing with dates, or what results you are looking for, to be sure.

How many distinct active users did I have on a 90 day window? [duplicate]

This question already has answers here:
Query for count of distinct values in a rolling date range
(5 answers)
Closed 6 years ago.
I have a complex problem that seems to be trivial at first sight:
for a given 90 day window, how many distinct active users did I have?
The table I will use to query this is the login table (hosted in Redshift), and it has a timestamp with the logintime and usertoken as the user identifier.
Whenever I want to answer this for a single day, the query is easy and straightforward:
select count (distinct usertoken)
from logins
where datediff('d',logintime,getdate()) <= 90
The problem becomes complex because I want to have this in a table with the number for every given date.
07/07 100k
07/06 98k
07/05 99k
07/04 101k
(...)
Window functions do not help me because I need to count distinct, and this is not possible in a window function.
To my knowledge, there is no way to iterate in a SQL query.
How should I go about this?

Perhaps I am missing something but from what I understand this should do :
-- In SQL Server
select cast(logintime As Date) , count (distinct usertoken) from logins
where datediff(D,logintime,getdate()) <= 90 Group by
cast(logintime As Date)
in PostGreSQL
Change cast(logintime As Date) to trunc_Date(Day, logintime )
and datediff(D,logintime,getdate()) to datediff('d',logintime,getdate())

I am assuming that if a day has zero users logging in you don't mind not showing it in the list.
First we get a set of all the days we care about and call that set "days".
with days as (
select date_trunc('day', date) as day from logins
where date > now() - '90 days'::interval
group by day
)
Then we join the days set with the logins.
select day, count(distinct userid)
from days
join logins on date_trunc('day', logins.date) = days.day
group by day
order by day

The trivial way is very computationally expensive:
select days.d, count(distinct l.userid)
from (select distinct date_trunc('day', logintime) as d
from logins l
) days left join
(select distinct userid, date_trunc('day', logintime) as d
from logins
) l
on datediff('d', l.d, days.d) between 0 and 89
group by days.d
order by days.d;

Select one row per day for each value

I have a SQL query in PostgreSQL 9.4 that, while more complex due to the tables I am pulling data from, boils down to the following:
SELECT entry_date, user_id, <other_stuff>
FROM <tables, joins, etc>
GROUP BY entry_date, user_id
WHERE <whatever limits I want, such as limiting the date range or users>
With the result that I have one row per user, per day for which I have data. In general, this query would be run for an entry_date period of one month, with the desired result of having one row per day of the month for each user.
The problem is that there may not be data for every user every day of the month, and this query only returns rows for days that have data.
Is there some way to modify this query so it returns one row per day for each user, even if there is no data (other than the date and the user) in some of the rows?
I tried doing a join with a generate_series(), but that didn't work - it can make there be no missing days, but not per user. What I really need would be something like "for each user in list, generate series of (user,date) records"
EDIT: To clarify, the final result that I am looking for would be that for each user in the database - defined as a record in a user table - I want one row per date. So if I specify a date range of 5/1/15-5/31/15 in my where clause, I want 31 rows per user, even if that user had no data in that range, or only had data for a couple of days.

generate_series() was the right idea. You probably did not get the details right. Could work like this:
WITH cte AS (
SELECT entry_date, user_id, <other_stuff>
FROM <tables, joins, etc>
GROUP BY entry_date, user_id
WHERE <whatever limits I want>
)
SELECT *
FROM (SELECT DISTINCT user_id FROM cte) u
CROSS JOIN (
SELECT entry_date::date
FROM generate_series(current_date - interval '1 month'
, current_date - interval '1 day'
, interval '1 day') entry_date
) d
LEFT JOIN cte USING (user_id, entry_date);
I picked a running time window of one month ending "yesterday". You did not define your "month" exactly.
Assuming entry_date to be data type date.
Simpler for your updated requirements
To get results for every user in a users table (and not for a current selection) and for your given time range, it gets simpler. You don't need the CTE:
SELECT *
FROM (SELECT user_id FROM users) u
CROSS JOIN (
SELECT entry_date::date
FROM generate_series(timestamp '2015-05-01'
, timestamp '2015-05-31'
, interval '1 day') entry_date
) d
LEFT JOIN (
SELECT entry_date, user_id, <other_stuff>
FROM <tables, joins, etc>
GROUP BY entry_date, user_id
WHERE <whatever>
) t USING (user_id, entry_date);
Why this particular way to call generate_series()?
Generating time series between two dates in PostgreSQL
And best use ISO 8601 date format (YYYY-MM-DD) which works regardless of locale settings.

RedShift: Alternative to 'where in' to compare annual login activity

Here are the two cases:
Members Lost: Get the distinct count of user ids from 365 days ago who haven't had any activity since then
Members Added: Get the distinct count of user ids from today who don't exist in the previous 365 days.
Here are the SQL statements I've been writing. Logically I feel like this should work (and it does for sample data), but the dataset is 5Million+ rows and takes forever! Is there any way to do this more efficiently? (base_date is a calendar that I'm joining on to build out a 2 year trend. I figured this was faster than joining the 5million table on itself...)
-- Members Lost
SELECT
effective_date,
COUNT(DISTINCT dwuserid) as members_lost
FROM base_date
LEFT JOIN site_visit
-- Get Login Activity for 365th day
ON DATEDIFF(day, srclogindate, effective_date) = 365
WHERE dwuserid NOT IN (
-- Get Distinct Login activity for Current Day (PY) + 1 to Current Day (CY) (i.e. 2013-01-02 to 2014-01-01)
SELECT DISTINCT dwuserid
FROM site_visit b
WHERE DATEDIFF(day, b.srclogindate, effective_date) BETWEEN 0 AND 364
)
GROUP BY effective_date
ORDER BY effective_date;
-- Members Added
SELECT
effective_date,
COUNT(DISTINCT dwuserid) as members_added
FROM base_date
LEFT JOIN site_visit ON srclogindate = effective_date
WHERE dwuserid NOT IN (
SELECT DISTINCT dwuserid
FROM site_visit b
WHERE DATEDIFF(day, b.srclogindate, effective_date) BETWEEN 1 AND 365
)
GROUP BY effective_date
ORDER BY effective_date;
Thanks in advance for any help.
UPDATE
Thanks to #JohnR for pointing me in the right direction. I had to tweak your response a bit because I need to know on any login day how many were "Member Added" or "Member Lost" so it had to be a 365 rolling window looking back or looking forward. Finding the IDs that didn't have a match in the LEFT JOIN was much faster.
-- Trim data down to one user login per day
CREATE TABLE base_login AS
SELECT DISTINCT "dwuserid", "srclogindate"
FROM site_visit
-- Members Lost
SELECT
current."srclogindate",
COUNT(DISTINCT current."dwuserid") as "members_lost"
FROM base_login current
LEFT JOIN base_login future
ON current."dwuserid" = future."dwuserid"
AND current."srclogindate" < future."srclogindate"
AND DATEADD(day, 365, current."srclogindate") >= future."srclogindate"
WHERE future."dwuserid" IS NULL
GROUP BY current."srclogindate"
-- Members Added
SELECT
current."srclogindate",
COUNT(DISTINCT current."dwuserid") as "members_added"
FROM base_login current
LEFT JOIN base_login past
ON current."dwuserid" = past."dwuserid"
AND current."srclogindate" > past."srclogindate"
AND DATEADD(day, 365, past."srclogindate") >= current."srclogindate"
WHERE past."dwuserid" IS NULL
GROUP BY current."srclogindate"

NOT IN should generally be avoided because it has to scan all data.
Instead of joining to the site_visit table (which is presumably huge), try joining to a sub-query that selects UserID and the most recent login date -- that way, there is only one row per user instead of one row per visit.
For example:
SELECT dwuserid, min (srclogindate) as first_login, max(srclogindate) as last_login
FROM site_visit
GROUP BY dwuserid
You could then simplify the queries to something like:
-- Members Lost: Last login was between 12 and 13 months ago
SELECT
COUNT(*)
FROM
(
SELECT dwuserid, min(srclogindate) as first_login, max(srclogindate) as last_login
FROM site_visit
GROUP BY dwuserid
)
WHERE
last_login BETWEEN current_date - interval '13 months' and current_date - interval '12 months'
-- Members Added: First visit in last 12 months
SELECT
COUNT(*)
FROM
(
SELECT dwuserid, min(srclogindate) as first_login, max(srclogindate) as last_login
FROM site_visit
GROUP BY dwuserid
)
WHERE
first_login > current_date - interval '12 months'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

count of daily users in PostgresSQL where count zero is also added - sql

A left join with a generate_series call might be what you're looking for: SELECT j.d, count(u.*) FROM generate_series('2022-11-01'::date,CURRENT_DATE,interval '1 day') j (d) LEFT JOIN users u ON u.created = j.d GROUP BY 1; Demo: db<>fiddle

Related

Filling in empty dates

How to Average Number of Chats per Day on LEFT JOIN table in Snowflake SQL?

How many distinct active users did I have on a 90 day window? [duplicate]

Select one row per day for each value

RedShift: Alternative to 'where in' to compare annual login activity

Categories

Resources