Those who listened to more than 10 mins each month in the last 6 months - sql

I'm trying to figure out the count of users who listened to more than 10 mins each month in the last 6 months
We have this event: Song_stopped_listen and one attribute is session_progress_ms
Now I'm trying to see the monthly evolution of the count of this cohort over the last 6 months.
I'm using bigquery and this is the query I tried, but I feel that something is off semantically, but I couldn't put my finger on:
SELECT
CONCAT(CAST(EXTRACT(YEAR FROM DATE (timestamp)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (timestamp)) AS STRING)) AS date
,SUM(absl.session_progress_ms/(1000*60*10)) as total_10_ms, COUNT(DISTINCT u.id) as total_10_listeners
FROM ios.song_stopped_listen as absl
LEFT JOIN ios.users u on absl.user_id = u.id
WHERE absl.timestamp > '2018-05-01'
Group by 1
HAVING(total_10_ms > 1)
Please help figure out what I'm doing wrong here.
Thank you.
data Sample:
user_id | session_progress_ms | timestamp
1 | 10000 | 2017-10-10 14:34:25.656 UTC
What I want to have:
||Month-year | Count of users who listened to more than 10 mins
|2018-5 | 500
|2018-6 | 600
|2018-7 | 300
|2018-8 | 5100
|2018-9 | 4500
|2018-10 | 1500
|2018-11 | 1500
|2018-12 | 2500

Use multiple levels of aggregation:
select user_id
from (select ssl.user_id, timestamp_trunc(timestamp, month) as mon,
sum(ssl.session_progress_ms/(1000*60)) as total_minutes
from ios.song_stopped_listen as ssl
where date(ssl.timetamp) < date_trunc(current_date, month) and
date(ssl.timestamp) >= date_add(date_trunc(current_date, month) interval 6 month),
group by 1, 2
) u
where total_minutes >= 10
group by user_id
having count(*) = 6;
To get the count, just use this as a subquery with count(*).

Related

how to fetch count data of 2 date fields in same month in SQL

I am trying to create a query where I have 3 column.
C_Time: contains task Creation date time
Done_Time: Contains Task completion date time
User ID: Unique id of user
I want to get result where I want to get total count of created tasks in particular month and total number of done task at that same month grouped by user id
Output will be like:
UserID | CreatedCount | DoneCount
------------------------------------------
U12 | 12 | 12
-------------------------------------------
U13 | 7 | 5
here U12 user have created 12 tasks and completed 12 tasks in January 2020 month. But user U13 created 7 tasks in Jan 2020 and done 5 tasks in same month.
You can use apply to unpivot the data and then aggregation:
select t.user_id, sum(is_create), sum(is_complete)
from t cross apply
(values (t.c_time, 1, 0), (t.done_time, 0, 1)
) v(t, is_create, is_complete)
where v.t >= '2020-01-01' and v.t < '2020-02-01'
group by t.user_id;
You can also do this with conditional aggregation:
select user_id,
sum(case when c_time >= '2020-01-01' and c_time < '2020-02-01' then 1 else 0 end),
sum(case when done_time >= '2020-01-01' and done_time < '2020-02-01' then 1 else 0 end)
from t
group by user_id;
This is probably a little faster for your particular example. However, the first version is more generalizable -- for instance, it allows you to summarize easily by both user and month.

Vertica SQL for running count distinct and running conditional count

I'm trying to build a department level score table based on a deeper product url level score table.
Date is not consecutive
Not all urls got score updates at same day (independent to each other)
dist_url should be running count distinct (cumulative count distinct)
dist urls and urls score >=30 are both count distinct
What I have now is:
Date url Store Dept Page Score
10/1 a US A X 10
10/1 b US A X 30
10/1 c US A X 60
10/4 a US A X 20
10/4 d US A X 60
10/6 b US A X 22
10/9 a US A X 40
10/9 e US A X 10
Date Store Dept Page dist urls urls score >=30
10/1 US A X 3 2
10/4 US A X 4 3
10/6 US A X 4 2
10/9 US A X 5 2
I think the dist_url can be done by using window function, just not sure on query.
Current query is as below, but it's wrong since not cumulative count distinct:
SELECT
bm.AnalysisDate,
su.SoID AS Store,
su.DptCaID AS DTID,
su.PageTypeID AS PTID,
COUNT(DISTINCT bm.SeoURLID) AS NumURLsWithDupScore,
SUM(CASE WHEN bm.DuplicationScore > 30 THEN 1 ELSE 0 END) AS Over30Count
FROM csn_seo.tblBotifyMetrics bm
INNER JOIN csn_seo.tblSEOURLs su
ON bm.SeoURLID = su.ID
WHERE su.DptCaID IS NOT NULL
AND su.DptCaID <> 0
AND su.PageTypeID IS NOT NULL
AND su.PageTypeID <> -1
AND bm.iscompliant = 1
GROUP BY bm.AnalysisDate, su.SoID, su.DptCaID, su.PageTypeID;
Please let me know if anyone has any idea.
Based on your question, you seem to want two levels of logic:
select date, store, dept,
sum(sum(start)) over (partition by dept, page order by date) as distinct_urls,
sum(sum(start_30)) over (partition by dept, page order by date) as distinct_urls_30
from ((select store, dept, page, url, min(date) as date, 1 as start, 0 as start_30
from t
group by store, dept, page, url
) union all
(select store, dept, page, url, min(date) as date, 0, 1
from t
where score >= 30
group by store, dept, page, url
)
) t
group by date, store, dept, page;
I don't understand how your query is related to your question.
Try as I might, I don't get your output either:
But I think you can avoid UNION SELECTs - Does this do what you expect?
NULLS don't figure in COUNT DISTINCTs - and here you can combine an aggregate expression with an OLAP one ...
And Vertica has named windows to increase readability ....
WITH
input(Date,url,Store,Dept,Page,Score) AS (
SELECT DATE '2019-10-01','a','US','A','X',10
UNION ALL SELECT DATE '2019-10-01','b','US','A','X',30
UNION ALL SELECT DATE '2019-10-01','c','US','A','X',60
UNION ALL SELECT DATE '2019-10-04','a','US','A','X',20
UNION ALL SELECT DATE '2019-10-04','d','US','A','X',60
UNION ALL SELECT DATE '2019-10-06','b','US','A','X',22
UNION ALL SELECT DATE '2019-10-09','a','US','A','X',40
UNION ALL SELECT DATE '2019-10-09','e','US','A','X',10
)
SELECT
date
, store
, dept
, page
, SUM(COUNT(DISTINCT url) ) OVER(w) AS dist_urls
, SUM(COUNT(DISTINCT CASE WHEN score >=30 THEN url END)) OVER(w) AS dist_urls_gt_30
FROM input
GROUP BY
date
, store
, dept
, page
WINDOW w AS (PARTITION BY store,dept,page ORDER BY date)
;
-- out date | store | dept | page | dist_urls | dist_urls_gt_30
-- out ------------+-------+------+------+-----------+-----------------
-- out 2019-10-01 | US | A | X | 3 | 2
-- out 2019-10-04 | US | A | X | 5 | 3
-- out 2019-10-06 | US | A | X | 6 | 3
-- out 2019-10-09 | US | A | X | 8 | 4
-- out (4 rows)
-- out
-- out Time: First fetch (4 rows): 45.321 ms. All rows formatted: 45.364 ms

SQL sum and previous row [duplicate]

This question already has answers here:
Calculate a Running Total in SQL Server
(15 answers)
Closed 3 years ago.
I have the following table:
________________________
date | amount
________________________
01-01-2019 | 10
01-01-2019 | 10
01-01-2019 | 10
01-01-2019 | 10
02-01-2019 | 5
02-01-2019 | 5
02-01-2019 | 5
02-01-2019 | 5
03-01-2019 | 20
03-01-2019 | 20
These are mutation values by date. I would like my query to return the summed amount by date. So for 02-01-2019 I need 40 ( 4 times 10) + 20 ( 4 times 5). For 03-01-2019 I would need ( 4 times 10) + 20 ( 4 times 5) + 40 ( 2 times 20) and so on. Is this possible in one query? How do I achieve this?
My current query to get the individual mutations:
Select s.date,
Sum(s.amount) As Sum_amount
From dbo.Financieel As s
Group By s.date
You can try below -
DEMO
select dateval,
SUM(amt) OVER(ORDER BY dateval ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as amt
from
(
SELECT
dateval,
SUM(amount) amt
FROM t2 group by dateval
)A
OUTPUT:
dateval amt
01/01/2019 00:00:00 40
01/02/2019 00:00:00 60
01/03/2019 00:00:00 100
Try this below script to get your desired output-
SELECT A.date,
(SELECT SUM(amount) FROM <your_table> WHERE Date <= A.Date) C_Total
FROM <your_table> A
GROUP BY date
ORDER BY date
Output is-
date C_Total
01-01-2019 40
02-01-2019 60
03-01-2019 100
I suggest to use a window function, like this:
select date, sum(amount) over( order by date)
from table

Postgres count number or rows and group them by timestamp

Let's assume I have one table in postgres with just 2 columns:
ID which is PK for the table (bigint)
time which is type of timestamp
Is there any way how to get IDs grouped by time BY YEAR- when the time is date 18 February 2005 it would fit in 2005 group (so result would be)
year number of rows
1998 2
2005 5
AND if the number of result rows is smaller than some number (for example 3) SQL will return the result by month
Something like
month number of rows
(February 2018) 5
(March 2018) 2
Is that possible some nice way in postgres SQL?
You can do it using window functions (as always).
I use this table:
TABLE times;
id | t
----+-------------------------------
1 | 2018-03-14 20:04:39.81298+01
2 | 2018-03-14 20:04:42.92462+01
3 | 2018-03-14 20:04:45.774615+01
4 | 2018-03-14 20:04:48.877038+01
5 | 2017-03-14 20:05:08.94096+01
6 | 2017-03-14 20:05:16.123736+01
7 | 2017-03-14 20:05:19.91982+01
8 | 2017-01-14 20:05:32.249175+01
9 | 2017-01-14 20:05:35.793645+01
10 | 2017-01-14 20:05:39.991486+01
11 | 2016-11-14 20:05:47.951472+01
12 | 2016-11-14 20:05:52.941504+01
13 | 2016-10-14 21:05:52.941504+02
(13 rows)
First, group by month (subquery per_month).
Then add the sum per year with a window function (subquery with_year).
Finally, use CASE to decide which one you will output and remove duplicates with DISTINCT.
SELECT DISTINCT
CASE WHEN yc > 5
THEN mc
ELSE yc
END AS count,
CASE WHEN yc > 5
THEN to_char(t, 'YYYY-MM')
ELSE to_char(t, 'YYYY')
END AS period
FROM (SELECT
mc,
sum(mc) OVER (PARTITION BY date_trunc('year', t)) AS yc,
t
FROM (SELECT
count(*) AS mc,
date_trunc('month', t) AS t
FROM times
GROUP BY date_trunc('month', t)
) per_month
) with_year
ORDER BY 2;
count | period
-------+---------
3 | 2016
3 | 2017-01
3 | 2017-03
4 | 2018
(4 rows)
Just count years. If it's at least 3, then you group by years, else by months:
select
case (select count(distinct extract(year from time)) from mytable) >= 3 then
to_char(time, 'yyyy')
else
to_char(time, 'yyyy-mm')
end as season,
count(*)
from mytable
group by season
order by season;
(Unlike many other DBMS, PostgreSQL allows to use alias names in the GROUP BY clause.)

Using outer query result in a subquery in postgresql

I have two tables points and contacts and I'm trying to get the average points.score per contact grouped on a monthly basis. Note that points and contacts aren't related, I just want the sum of points created in a month divided by the number of contacts that existed in that month.
So, I need to sum points grouped by the created_at month, and I need to take the count of contacts FOR THAT MONTH ONLY. It's that last part that's tricking me up. I'm not sure how I can use a column from an outer query in the subquery. I tried something like this:
SELECT SUM(score) AS points_sum,
EXTRACT(month FROM created_at) AS month,
date_trunc('MONTH', created_at) + INTERVAL '1 month' AS next_month,
(SELECT COUNT(id) FROM contacts WHERE contacts.created_at <= next_month) as contact_count
FROM points
GROUP BY month, next_month
ORDER BY month
So, I'm extracting the actual month that my points are being summed, and at the same time, getting the beginning of the next_month so that I can say "Get me the count of contacts where their created at is < next_month"
But it complains that column next_month doesn't exist This is understandable as the subquery knows nothing about the outer query. Qualifying with points.next_month doesn't work either.
So can someone point me in the right direction of how to achieve this?
Tables:
Points
score | created_at
10 | "2011-11-15 21:44:00.363423"
11 | "2011-10-15 21:44:00.69667"
12 | "2011-09-15 21:44:00.773289"
13 | "2011-08-15 21:44:00.848838"
14 | "2011-07-15 21:44:00.924152"
Contacts
id | created_at
6 | "2011-07-15 21:43:17.534777"
5 | "2011-08-15 21:43:17.520828"
4 | "2011-09-15 21:43:17.506452"
3 | "2011-10-15 21:43:17.491848"
1 | "2011-11-15 21:42:54.759225"
sum, month and next_month (without the subselect)
sum | month | next_month
14 | 7 | "2011-08-01 00:00:00"
13 | 8 | "2011-09-01 00:00:00"
12 | 9 | "2011-10-01 00:00:00"
11 | 10 | "2011-11-01 00:00:00"
10 | 11 | "2011-12-01 00:00:00"
Edit
Now with running sum of contacts. My first draft used new contacts per month, which is obviously not what OP wants.
WITH c AS (
SELECT created_at
,count(id) OVER (order BY created_at) AS ct
FROM contacts
), p AS (
SELECT date_trunc('month', created_at) AS month
,sum(score) AS points_sum
FROM points
GROUP BY 1
)
SELECT p.month
,EXTRACT(month FROM p.month) AS month_nr
,p.points_sum
,( SELECT c.ct
FROM c
WHERE c.created_at < (p.month + interval '1 month')
ORDER BY c.created_at DESC
LIMIT 1) AS contacts
FROM p
ORDER BY 1
This works for any number of months across the years.
Assumes that no month is missing in the table points. If you want all months, including missing ones in points, generate a list of months with generate_series() and LEFT JOIN to it.
Build a running sum in a CTE with a window function.
Both CTE are not strictly necessary - for performance and simplification only.
Get contacts_count in a subselect.
Your original form of the query could work like this:
SELECT month
,EXTRACT(month FROM month) AS month_nr
,points_sum
,(SELECT count(*)
FROM contacts c
WHERE c.created_at < (p.month + interval '1 month')) AS contact_count
FROM (
SELECT date_trunc('MONTH', created_at) AS month
,sum(score) AS points_sum
FROM points p
GROUP BY 1
) p
ORDER BY 1
The fix for the immediate cause of your error is to put the aggregate into a subquery. You were mixing levels in a way that is impossible.
I expect my variant to be slightly faster with big tables. Not sure about smaller tables. Would be great if you'd report back with test results.
Plus a minor fix: < instead of <=.