Find number of repeating visitors in a month - PostgreSQL - sql

I am using PostgreSQL and my data looks something like this:
UserID TimeStamp
1 2014-02-03
2 2014-02-03
3 2014-02-03
1 2014-03-03
2 2014-03-03
6 2014-03-03
7 2014-03-03
This is just dummy data for 2 days in which some UserID is getting repeated on both the days. I would like to find out the number of repeated UserId every month. For this example the final result set should look like:
Count Year Month
0 2014 2
2 2014 3
In the above table, March 3014 has 2 repeat UserID and Feb 2014 has none.
I can find out the distinct UserID for each month but not the repeated UserID. Any help in this regard would be much appreciated.

select
count(distinct userid) as "Count",
extract(year from t0.timestamp) as "Year",
extract(month from t0.timestamp) as "Month"
from
t t1
inner join
t t0 using (userid)
where t0.timestamp < date_trunc('month', t1.timestamp)
group by 2, 3
or may be faster
select
count(distinct userid) as "Count",
extract(year from t0.timestamp) as "Year",
extract(month from t0.timestamp) as "Month"
from t t1
where exists (
select 1
from t
where
userid = t1.userid
and
timestamp < date_trunc('month', t1.timestamp)
)
group by 2, 3

This might work, have not tested it out yet.
SELECT
COUNT(DISTINCT(UserId))
, EXTRACT(YEAR FROM TIMESTAMP TimeStamp) AS Year
, EXTRACT(MONTH FROM TIMESTAMP Timestamp) AS Month
FROM TABLE
GROUP BY TimeStamp

To rephrase your question:
How many users are not new (i.e. already visited the shop/website/whatever in a previous month) for each month?
SELECT
yr, mon,
COUNT(*) AS all_users,
COUNT(*) - SUM(repeated) AS new_users,
SUM(repeated) AS existing_users
FROM
(
SELECT UserId,
EXTRACT(YEAR FROM TimeStamp) AS yr,
EXTRACT(MONTH FROM TimeStamp) AS mon,
CASE WHEN ROW_NUMBER() -- 1st time users get 0
OVER (PARTITION BY UserId
ORDER BY EXTRACT(YEAR FROM TimeStamp) ,
EXTRACT(MONTH FROM TimeStamp)) = 1
THEN 0
ELSE 1
END AS repeated
FROM vt
GROUP BY UserId,
EXTRACT(YEAR FROM TimeStamp),
EXTRACT(MONTH FROM TimeStamp)
) AS dt
GROUP BY yr,mon
ORDER BY 1,2
The inner GROUP BY is needed if there are multiple rows for a user within the same month.

Is this what you want?
select yyyymm, sum(case when cnt > 1 then 1 else 0 end) as dupcnt
from (select to_char(timestamp, 'YYYY-MM') as yyyymm, userid, count(*) as cnt
from table t
group by to_char(timestamp, 'YYYY-MM'), userid
) t
group by yyyymm
order by yyyymm;

Related

SQL Bigquery Counting repeated customers from transaction table

I have a transaction table that looks something like this.
userid
orderDate
amount
111
2021-11-01
20
112
2021-09-07
17
111
2021-11-21
17
I want to count how many distinct customers (userid) that bought from our store this month also bought from our store in the previous month. For example, in February 2020, we had 20 customers and out of these 20 customers 7 of them also bought from our store in the previous month, January 2020. I want to do this for all the previous months so ending up with something like.
year
month
repeated customers
2020
01
11
2020
02
7
2020
03
9
I have written this but this only works for only the current month. How would I iterate or rewrite it to get the table as shown above.
WITH CURRENT_PERIOD AS (
SELECT DISTINCT userid
FROM table1
WHERE DATE(orderDate) BETWEEN DATE_TRUNC(CURRENT_DATE(),MONTH) AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
),
PREVIOUS_PERIOD AS (
SELECT DISTINCT userid
FROM table1
WHERE DATE(orderDate) BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH),MONTH) AND LAST_DAY(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH))
)
SELECT count(1)
FROM CURRENT_PERIOD RC
WHERE RC.userid IN (SELECT DISTINCT userid FROM PREVIOUS_PERIOD)
You can summarize to get one record per month, use lag(), and then aggregate:
select yyyymm,
countif(prev_yyyymm = date_add(yyyymm, interval -1 month)
from (select userid, date_trunc(order_date, month) as yyyymm,
lag(date_trunc(order_date, month)) over (partition by userid order by date_trunc(order_date, month)) as prev_yyyymm
from table1
group by 1, 2
) t
group by yyyymm
order by yyyymm;

How to find users who made an order in any year then didn't make one the year after

My table scheme looks like this
id | user_id | price | date
1235085 | 429009 | 1301.3 | 2016-01-01
1235016 | 1106100 | 2343.6 | 2016-01-01
1235007 | 707164 | 980.7 | 2016-01-01
there are 20 million records.
I have to find users which are made some orders in any year, but didn't the following year.
I tried use this query
select user_id
from orders o1
where not exists (select user_id from orders o2
where extract(year from o2.date) + 1 > extract(year from o1.date))
but it doesn't work
Use EXCEPT:
select distinct user_id from orders
except
select distinct user_id
from orders o1
where exists(
select 1
from orders o2
where o2.user_id = o1.user_id
and extract(year from o2.date) + 1 = extract(year from o1.date)
)
Here is one method:
select user_id, yyyy
from (select user_id, date_trunc('year', date) as yyyy,
lead(date_trunc('year', date)) over (partition by user_id order by date_trunc('year', date)) as next_year
from t
group by user_id, yyyy
) u
where next_year <> yyyy + interval '1 year' or next_year is null;
This assumes that you actually want the year as well. If not, use select distinct user_id.
You might also want to add the condition yyyy <> date_trunc(now()) so you don't get users who made their first purchase this year. Without this condition, I think you will return all users, because every user has a "last purchase" with no purchases the following year.
EDIT:
Interestingly, you can do this with lead() as well:
select user_id, date
from (select t.*, lead(date) over (partition by user_id order by date) as next_date
from t
) t
where (next_date is null or
extract(year from next_date) <> extract(year from date) + 1
) and
date < date_trunc('year', now());
Because lead() orders the values, this should return at most one value for a given year, even when there are multiple orders in a year.

sql query group by day of month

I have a record table that icludes dates of rows created. (oracle db)
ID City CreateDate
1 city-1 12.12.2017
1 city-2 13.12.2017
1 city-1 13.12.2017
1 city-3 12.12.2017
....
....
I want to create a daiy report in a month. For example City-1 report by days in December
Day Count
1 10
2 80
3 60
4 10
...
30 11
I think you can use extract with count functions:
SELECT EXTRACT(day FROM CreateDate) "Day",
COUNT(CreateDate) "Number of Reports"
FROM yourTableName
GROUP BY EXTRACT(day FROM CreateDate)
ORDER BY "Number of Reports" ASC;
If I understood it correctly, the following query will generate the report you wanted for December.
SELECT EXTRACT(day FROM CreateDate) "Day", COUNT(*) "Count" FROM your_record_table WHERE EXTRACT(month FROM CreateDate) = 12 GROUP BY EXTRACT(day FROM CreateDate) ORDER BY EXTRACT(day FROM CreateDate);

postgreSQL- Count for value between previous month start date and end date

I have a table as follows
user_id date month year visiting_id
123 11-04-2017 APRIL 2017 4500
123 12-05-2017 MAY 2017 4567
123 13-05-2017 MAY 2017 4568
123 17-05-2017 MAY 2017 4569
123 22-05-2017 MAY 2017 4570
123 11-06-2017 JUNE 2017 4571
123 12-06-2017 JUNE 2017 4572
I want to calculate the visiting count for the current month and last month at the monthly level as follows:
user_id month year visit_count_this_month visit_count_last_month
123 APRIL 2017 1 0
123 MAY 2017 4 1
123 JUNE 2017 2 4
I was able to calculate visit_count_this_month using the following query
SELECT v.user_id, v.month, v.year,
SUM(is_visit_this_month) as visit_count_this_month
FROM
(SELECT user_id, date, month, year,
CASE WHEN TO_CHAR(date, 'MM/YYYY') = TO_CHAR(date, 'MM/YYYY')
THEN 1 ELSE 0
END as is_visit_this_month
FROM visits
GROUP BY user_id, date, month, year
HAVING user_id = 123) v
GROUP BY v.user_id, v.month, v.year
However, I'm stuck with calculating visit_count_last_month. Similar to this, I also want to calculate visit_count_last_2months.
Can somebody help?
You can use a LATERAL JOIN like this:
SELECT user_id, month, year, COUNT(*) as visit_count_this_month, visit_count_last_month
FROM visits v
CROSS JOIN LATERAL (
SELECT COUNT(*) as visit_count_last_month
FROM visits
WHERE user_id = v.user_id
AND date = (CAST(v.date AS date) - interval '1 month')
) l
GROUP BY user_id, month, year, visit_count_last_month;
SQLFiddle - http://sqlfiddle.com/#!15/393c8/2
Assuming there are values for every month, you can get the counts per month first and use lag to get the previous month's values per user.
SELECT T.*
,COALESCE(LAG(visits,1) OVER(PARTITION BY USER_ID ORDER BY year,mth),0) as last_month_visits
,COALESCE(LAG(visits,2) OVER(PARTITION BY USER_ID ORDER BY year,mth),0) as last_2_month_visits
FROM (
SELECT user_id, extract(month from date) as mth, year, COUNT(*) as visits
FROM visits
GROUP BY user_id, extract(month from date), year
) T
If there can be missing months, it is best to generate all months within a specified timeframe and left join ing the table on to that. (This example shows it for all the months in 2017).
select user_id,yr,mth,visits
,coalesce(lag(visits,1) over(PARTITION BY USER_ID ORDER BY yr,mth),0) as last_month_visits
,coalesce(lag(visits,2) OVER(PARTITION BY USER_ID ORDER BY yr,mth),0) as last_2_month_visits
from (select u.user_id,extract(year from d.dt) as yr, extract(month from d.dt) as mth,count(v.visiting_id) as visits
from generate_series(date '2017-01-01', date '2017-12-31',interval '1 month') d(dt)
cross join (select distinct user_id from visits) u
left join visits v on extract(month from v.dt)=extract(month from d.dt) and extract(year from v.dt)=extract(year from d.dt) and u.user_id=v.user_id
group by u.user_id,extract(year from d.dt), extract(month from d.dt)
) t

Need to find Average of top 3 records grouped by ID in SQL

I have a postgres table with customer ID's, dates, and integers. I need to find the average of the top 3 records for each customer ID that have dates within the last year. I can do it with a single ID using the SQL below (id is the customer ID, weekending is the date, and maxattached is the integer).
One caveat: the maximum values are per month, meaning we're only looking at the highest value in a given month to create our dataset, thus why we're extracting month from the date.
SELECT
id,
round(avg(max),0)
FROM
(
select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max
FROM
myTable
WHERE
weekending >= now() - interval '1 year' AND
id=110070 group by id,month,year
ORDER BY
max desc limit 3
) AS t
GROUP BY id;
How can I expand this query to include all ID's and a single averaged number for each one?
Here is some sample data:
ID | MaxAttached | Weekending
110070 | 5 | 2011-11-10
110070 | 6 | 2011-11-17
110071 | 4 | 2011-11-10
110071 | 7 | 2011-11-17
110070 | 3 | 2011-12-01
110071 | 8 | 2011-12-01
110070 | 5 | 2012-01-01
110071 | 9 | 2012-01-01
So, for this sample table, I would expect to receive the following results:
ID | MaxAttached
110070 | 5
110071 | 8
This averages the highest value in a given month for each ID (6,3,5 for 110070 and 7,8,9 for 110071)
Note: postgres version 8.1.15
First - get the max(maxattached) for every customer and month:
SELECT id,
max(maxattached) as max_att
FROM myTable
WHERE weekending >= now() - interval '1 year'
GROUP BY id, date_trunc('month',weekending);
Next - for every customer rank all his values:
SELECT id,
max_att,
row_number() OVER (PARTITION BY id ORDER BY max_att DESC) as max_att_rank
FROM <previous select here>;
Next - get the top 3 for every customer:
SELECT id,
max_att
FROM <previous select here>
WHERE max_att_rank <= 3;
Next - get the avg of the values for every customer:
SELECT id,
avg(max_att) as avg_att
FROM <previous select here>
GROUP BY id;
Next - just put all the queries together and rewrite/simplify them for your case.
UPDATE: Here is an SQLFiddle with your test data and the queries: SQLFiddle.
UPDATE2: Here is the query, that will work on 8.1 :
SELECT customer_id,
(SELECT round(avg(max_att),0)
FROM (SELECT max(maxattached) as max_att
FROM table1
WHERE weekending >= now() - interval '2 year'
AND id = ct.customer_id
GROUP BY date_trunc('month',weekending)
ORDER BY max_att DESC
LIMIT 3) sub
) as avg_att
FROM customer_table ct;
The idea - to take your initial query and run it for every customer (customer_table - table with all unique id for customers).
Here is SQLFiddle with this query: SQLFiddle.
Only tested on version 8.3 (8.1 is too old to be on SQLFiddle).
8.3 version
8.3 is the oldest version I've got access to, so I can't guarantee it'll work in 8.1
I'm using a temporary table to work out the best three records.
CREATE TABLE temp_highest_per_month as
select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max_in_month,
0 as priority
FROM
myTable
WHERE
weekending >= now() - interval '1 year'
group by id,month,year;
UPDATE temp_highest_per_month t
SET priority =
(select count(*) from temp_highest_per_month t2
where t2.id = t.id and
(t.max_in_month < t2.max_in_month or
(t.max_in_month= t2.max_in_month and
t.year * 12 + t.month > t2.year * 12 + t.month)));
select id,round(avg(max_in_month),0)
from temp_highest_per_month
where priority <= 3
group by id;
The year & month are included in the working out the priority so that if two months have the same maximum, they'll still be included in the numbering correctly.
9.1 version
Similar to Igor's answer, but I used the With clause to split the steps.
with highest_per_month as
( select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max_in_month
FROM
myTable
WHERE
weekending >= now() - interval '1 year'
group by id,month,year),
prioritised as
( select id, month, year, max_in_month,
row_number() over (partition by id, month, year
order by max_in_month desc)
as priority
from highest_per_month
)
select id, round(avg(max_in_month),0)
from prioritised
where priority <= 3
group by id;