sql - find the number of days a user was using the app - sql

I like to write a sql query that counts the number of days each user used the application and how many concurrent days. A user can enter the app several times a day but that should count as 1.
My table looks like this:
id | bigint
user_id | bigint
action_date | timestamp without time zone

To count the number of days per user:
SELECT user_id, count(DISTINCT action_date::date) AS days
FROM user_action_tbl
GROUP BY user_id;

One way to do it
SELECT user_id, COUNT(*) days_total, SUM(conseq) days_consecutive
FROM
(
SELECT user_id,
CASE WHEN LEAD(date, 1) OVER (PARTITION BY user_id ORDER BY date) - date = 1 THEN 1 ELSE 0 END consecutive
FROM
(
SELECT user_id, action_date::date date
FROM table1
GROUP BY user_id, action_date::date
) q
) p
GROUP BY user_id
Here is a SQLFiddle demo

Related

SQL to find when amount reached a certain value for the first time

I have a table that has 3 columns: user_id, date, amount. I need to find out on which date the amount reached 1 Million for the first time. The amount can go up or down on any given day.
I tried using partition by user_id order by date desc but I can't figure out how to find the exact date on which it reached 1 Million for the first time. I am exploring lead, lag functions. Any pointers would be appreciated.
You may use conditional aggregation as the following:
select user_id,
min(case when amount >= 1000000 then date end) as expected_date
from table_name
group by user_id
And if you want to check where the amount reaches exactly 1M, use case when amount = 1000000 ...
If you meant that the amount is a cumulative amount over the increasing of date, then query will be:
select user_id,
min(case when cumulative_amount >= 1000000 then date end) as expected_date
from
(
select *,
sum(amount) over (partition by user_id order by date) cumulative_amount
from table_name
) T
group by user_id;
Try this:
select date,
sum(amount) as totalamount
from tablename
group by date
having totalamount>=1000000
order by date asc
limit 1
This would summarize the amount for each day and return 1 record where it reached 1M for the first time.
Sample result on SQL Fiddle.
And if you want it to be grouped for both date and user_id, add user_id in select and group by clauses.
select user_id, date,
sum(amount) as totalamount
from tablename
group by user_id,date
having totalamount>=1000000
order by date asc
limit 1
Example here.

Daily count of user_ids who have visited my store 4 or more than 4 times every day

I have a table of user_id who have visited my platform. I want to get count of only those user IDs who have visited my store 4 or more times for each user and for every day for a duration of 10 days.
To achieve this I am using this query:
select date(arrival_timestamp), count(user_id)
from mytable
where date(arrival_timestamp) >= current_date-10
and date(arrival_timestamp) < current_date
group by 1
having count(user_id)>=4
order by 2 desc
limit 10;
But this query is virtually taking all the users having count value greater than 4 and not on a daily basis which covers almost every user and hence I am not able to segregate only those users who vist my store more than once on a particular day. Any help in this regard is appreciated.
Thanks
you can try this
with list as (
select user_id, count(*) as user_count, array_agg(arrival_timestamp) as arrival_timestamp
from mytable
where date(arrival_timestamp) >= current_date-10
and date(arrival_timestamp) < current_date
group by user_id)
select user_id, unnest(arrival_timestamp)
from list
where user_count >= 4
From a list of daily users that have visited your store 4 or more times a day over the last 10 days (the internal query) select these who have 10 occurencies, i.e. every day.
select user_id
from
(
select user_id
from the_table
where arrival_timestamp::date between current_date - 10 and current_date - 1
group by user_id, arrival_timestamp::date
having count(*) >= 4
) t
group by user_id
having count(*) = 10;

How to get the average of the number of actions per day

I have written the sql query:
SELECT id
date_diff("day", create_date, date) as day
action_type
FROM "my_database"
It brings this:
id day action_type
1 0 upload
1 0 upload
1 0 upload
1 1 upload
1 1 upload
2 0 upload
2 0 upload
2 1 upload
How to change my query to get table with unique days in column day and average number "upload" action_type among all id's. So desired result must look like this:
day avg_num_action
0 2.5
1 1.5
It is 2.5, because (3+2)/2 (3 uploads of id:1 and 2 uploads for id:2). same for 1.5
Please try this. Consider your given query as a table. If any WHERE condition needed then please enable this other wise disable where clause.
SELECT t.day
, COUNT(*) / COUNT(DISTINCT t.id) avg_num_action
FROM (SELECT id,
date_diff("day", create_date, date) as day,
action_type
FROM "my_database") t
WHERE t.action_type = 'upload'
GROUP BY t.day
Create a table from your given result set and write query based on that.
SELECT t.tday
, COUNT(*) / COUNT(DISTINCT t.id) avg_num_action
FROM my_database t
GROUP BY t.tday
Please check from url https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=871935ea2b919c4e24eb83fcbce78973
Update: I think my two-steps approach is more complicated than needed. Rahul Biswas shows how this can be done in one step. I suggest you use and accept his answer.
Original answer:
Two steps:
Count entries per ID and day
Take the average count per day
The query:
with rows as (select id, date_diff('day', create_date, date) as day from mytable)
, per_id_and_day as (select id, day, count(*) as cnt from rows group by id, day)
select day, avg(cnt)
from per_id_and_day
group by day
order by day;
You don't need a subquery for this logic:
SELECT date_diff("day", create_date, date) as day,
COUNT(*) * 1.0 / COUNT(DISTINCT id)
FROM "my_database"
GROUP BY date_diff("day", create_date, date)

Hive/SQL How do you access the value of the column which you just computed for previous rows?

I have a table uv_user_date looks like this:
Its basically a user log in table which shows the cumulative login days partition by user_id.
And the column pre show the last login date of a user login record.
Based on this I want to compute the consecutive login days for each user record.
The answer should be :
My idea is : for a record
if(uv_date - pre = 1 day)
then consecutive login days is the last consecutive login days + 1
else
1
but I am having trouble with accessing the last consecutive login days value.
The Code would be:
SELECT *,
if(pre = date_add(uv_date, -1), last(consecutive_days) + 1, 1) consecutive_days
FROM uv_user_date
Is there any way to get the value of last(consecutive_days)
First find date difference
tbl1:
select *,
if(pre = NULL, 1, datediff(uv_date, pre)) as diff
from your_table
then difference between cumulative sum of difference and accumulative_uv_date for each user_id, you want to use it as rank
tbl2:
select *,
sum(diff) over (partition by user_id order by uv_date rows between unbounded preceding and current) - accumulative_uv_date as rnk
from tbl1
finally, count consecutive days
select user_id, uv_date, rnk
row_number() over (partition by user_id, rnk order by uv_date) as consecutive_days
from tbl2

Active customers for each day who were active in last 30 days

I have a BQ table, user_events that looks like the following:
event_date | user_id | event_type
Data is for Millions of users, for different event dates.
I want to write a query that will give me a list of users for every day who were active in last 30 days.
This gives me total unique users on only that day; I can't get it to give me the last 30 for each date. Help is appreciated.
SELECT
user_id,
event_date
FROM
[TableA]
WHERE
1=1
AND user_id IS NOT NULL
AND event_date >= DATE_ADD(CURRENT_TIMESTAMP(), -30, 'DAY')
GROUP BY
1,
2
ORDER BY
2 DESC
Below is for BigQuery Standard SQL and has few assumption about your case:
there is only one row per date per user
user is considered active in last 30 days if user has at least 5 (sure can be any number - even just 1) entries/rows within those 30 days
If above make sense - see below
#standardSQL
SELECT
user_id, event_date
FROM (
SELECT
user_id, event_date,
(COUNT(1)
OVER(PARTITION BY user_id
ORDER BY UNIX_DATE(event_date)
RANGE BETWEEN 30 PRECEDING AND 1 PRECEDING)
) >= 5 AS activity
FROM `yourTable`
)
WHERE activity
GROUP BY user_id, event_date
-- ORDER BY event_date
If above assumption #1 is not correct - you can just simple add pre-grouping as a sub-select
#standardSQL
SELECT
user_id, event_date
FROM (
SELECT
user_id, event_date,
(COUNT(1)
OVER(PARTITION BY user_id
ORDER BY UNIX_DATE(event_date)
RANGE BETWEEN 30 PRECEDING AND 1 PRECEDING)
) >= 5 AS activity
FROM (
SELECT user_id, event_date
FROM `yourTable`
GROUP BY user_id, event_date
)
)
WHERE activity
GROUP BY user_id, event_date
-- ORDER BY event_date
UPDATE
From comments: If user have any of the event_type IN ('view', 'conversion', 'productDetail', 'search') , they will be considered active. That means any kind of event triggered within the app
So, you can go with below, I think
#standardSQL
SELECT
user_id, event_date
FROM (
SELECT
user_id, event_date,
(COUNT(1)
OVER(PARTITION BY user_id
ORDER BY UNIX_DATE(event_date)
RANGE BETWEEN 30 PRECEDING AND 1 PRECEDING)
) >= 5 AS activity
FROM (
SELECT user_id, event_date
FROM `yourTable`
WHERE event_type IN ('view', 'conversion', 'productDetail', 'search')
GROUP BY user_id, event_date
)
)
WHERE activity
GROUP BY user_id, event_date
-- ORDER BY event_date