SQL Bigquery Counting repeated customers from transaction table - sql

I have a transaction table that looks something like this.
userid
orderDate
amount
111
2021-11-01
20
112
2021-09-07
17
111
2021-11-21
17
I want to count how many distinct customers (userid) that bought from our store this month also bought from our store in the previous month. For example, in February 2020, we had 20 customers and out of these 20 customers 7 of them also bought from our store in the previous month, January 2020. I want to do this for all the previous months so ending up with something like.
year
month
repeated customers
2020
01
11
2020
02
7
2020
03
9
I have written this but this only works for only the current month. How would I iterate or rewrite it to get the table as shown above.
WITH CURRENT_PERIOD AS (
SELECT DISTINCT userid
FROM table1
WHERE DATE(orderDate) BETWEEN DATE_TRUNC(CURRENT_DATE(),MONTH) AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
),
PREVIOUS_PERIOD AS (
SELECT DISTINCT userid
FROM table1
WHERE DATE(orderDate) BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH),MONTH) AND LAST_DAY(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH))
)
SELECT count(1)
FROM CURRENT_PERIOD RC
WHERE RC.userid IN (SELECT DISTINCT userid FROM PREVIOUS_PERIOD)

You can summarize to get one record per month, use lag(), and then aggregate:
select yyyymm,
countif(prev_yyyymm = date_add(yyyymm, interval -1 month)
from (select userid, date_trunc(order_date, month) as yyyymm,
lag(date_trunc(order_date, month)) over (partition by userid order by date_trunc(order_date, month)) as prev_yyyymm
from table1
group by 1, 2
) t
group by yyyymm
order by yyyymm;

Related

How to select data with an unusual grouping by date?

There is a table:
id
direction_id
created_at
1
2
22 November 2021 г., 16:00:00
2
2
22 November 2021 г., 16:20:00
43
2
22 November 2021 г., 16:25:00
455
1
22 November 2021 г., 16:27:00
6567
2
22 November 2021 г., 17:36:00
674556
2
22 November 2021 г., 20:01:00
5243554
1
22 November 2021 г., 20:50:00
5243554
1
22 November 2021 г., 21:46:00
I need to get the following result:
1
2
created_at_by_hour
1
3
22.11.21 17
1
4
22.11.21 18
1
4
22.11.21 19
1
4
22.11.21 20
2
5
22.11.21 21
3
5
22.11.21 22
1 and 2 in the header are all possible values of direction_id that are in the table.
created_at is reduced to hours and you need to count how many records satisfy the condition <= created_at_by_hour. But the grouping should be such that if the time (hour) when no records were created, then just duplicate the previous hour.
The table consists of three fields - id (int), direction_id (int), created_at (timestamptz). I need to get an hourly (based on the created_at field) data upload with the number of records created before this "grouped" time. But I need not just the number, but separately for each direction_id (there are only two of them - 1 and 2). If no records were created for a certain direction_id at a certain hour, duplicate the previous one, but the result should end at the last created_at. created_at is the time when the record was created.
In my opinion, better to generate a date between min and max date according to an hour then calculate the count of each direction.
Demo
with time_range as (
select
min(created_at) + interval '1 hour' as min,
max(created_at) + interval '1 hour' as max
from test
)
select
count(*) filter (where direction_id = 1) as "1",
count(*) filter (where direction_id = 2) as "2",
to_char(gs.hour, 'dd.mm.yy HH24') as created_at_by_hour
from
test t
cross join time_range tr
inner join generate_series(tr.min, tr.max, interval '1 hour') gs(hour)
on t.created_at <= gs.hour
group by gs.hour
order by gs.hour
Truncate the date down to the hour, group by it and count. Then use SUM OVER to get a running total of the counts. In order to show missing hours in the table, you must generate a series of hours and outer join your data.
with hourly as
(
select date_trunc('hour', created_at) as hour, direction_id from mytable
)
, hours(hour) as
(
select *
from generate_series
(
(select min(hour) from hourly), (select max(hour) from hourly), interval '1 hour'
)
)
select
hours.hour,
sum(count(*) filter (where hourly.direction_id = 1)) over (order by hour) as "1",
sum(count(*) filter (where hourly.direction_id = 2)) over (order by hour) as "2"
from hours
left join hourly using (hour)
group by hour
order by hour;
Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=21d0c838452a09feac4ebc57906829f4

SQL Count Entries for each Month of the last 6 Months

I got a problem while trying to count the entries that were created in a month for the last 6 months.
The table looks like this:
A B C D
Year Month Startingdate Identifier
-----------------------------------------
2019 3 2019-03-12 OAM_1903121
2019 2 2019-03-21 OAM_1902211
And the result should look like:
A B C
Year Month Amount of orders
---------------------------------
2019 3 26
2019 2 34
This is what I have so far, but it doesn't get me the proper results:
SELECT year, month, COUNT(Startingdate) as Amount
FROM table
WHERE Startingdate > ((TRUNC(add_months(sysdate,-3) , 'MM'))-1)
GROUP BY year, month
I have not tested it, but it should work:
select year, month, count(Stringdate) as Amount_of_order
from table
where Stringdate between add_months(sysdate, -6) and sysdate
group by year, month;
Let me know.
Try that :
SELECT YEAR(Startingdate) AS [Year], MONTH(Startingdate) AS [Month], COUNT(*) AS Amount
FROM table
WHERE Startingdate > DATEADD(MONTH, -6, GETDATE())
GROUP BY YEAR(Startingdate), MONTH(Startingdate)
ORDER BY YEAR(Startingdate), MONTH(Startingdate) DESC
I think your issue is the filtering. If so, this should handle the most recent six full months:
SELECT year, month, COUNT(*) as num_orders
FROM table
WHERE Startingdate >= TRUNC(add_months(sysdate, -6) , 'MM')
GROUP BY year, month;

Insert zero values for unexisting groups in Redshift

I'm writing a simple query on Amazon Redshift as follows:
SELECT EXTRACT(year FROM created_at) AS year,
EXTRACT(month FROM created_at) AS month,
member_id,
COUNT(*) as pageviews
FROM TABLE
GROUP BY year,
month,
member_id
ORDER BY year,
month,
member_id
This gives me the following result as an example:
year month member_id pageviews
2015 1 100 29
2015 2 100 22
2015 3 100 178
2015 4 100 34
2015 1 200 56
2015 3 200 16
Here's the result I would like to have:
year month member_id pageviews
2015 1 100 29
2015 2 100 22
2015 3 100 178
2015 4 100 34
2015 1 200 56
2015 2 200 0
2015 3 200 16
2015 4 200 0
In the result above, notice the additional rows with zero pageviews.
How do I get this result? Any help would be much appreciated.
Use a cross join to generate the rows and then a left join to bring in the data:
SELECT EXTRACT(year FROM created_at) AS year,
EXTRACT(month FROM created_at) AS month,
m.member_id,
COUNT(t.member_id) as pageviews
FROM (SELECT DISTINCT EXTRACT(year FROM created_at) AS year, EXTRACT(month FROM created_at) AS month FROM TABLE) ym CROSS JOIN
(SELECT DISTINCT member_id FROM TABLE) m LEFT JOIN
TABLE t
ON EXTRACT(year FROM created_at) AS month = ym.year AND
EXTRACT(month FROM created_at) AS month = ym.month AND
t.member_id = m.member_id
GROUP BY ym.year, ym.month, m.member_id
ORDER BY ym.year, ym.month, m.member_id;
This assumes that all year/month combinations are included in the table.
If you have other tables that are better sources for members and the dates, try them -- that may be faster than SELECT DISTINCT.

postgreSQL- Count for value between previous month start date and end date

I have a table as follows
user_id date month year visiting_id
123 11-04-2017 APRIL 2017 4500
123 12-05-2017 MAY 2017 4567
123 13-05-2017 MAY 2017 4568
123 17-05-2017 MAY 2017 4569
123 22-05-2017 MAY 2017 4570
123 11-06-2017 JUNE 2017 4571
123 12-06-2017 JUNE 2017 4572
I want to calculate the visiting count for the current month and last month at the monthly level as follows:
user_id month year visit_count_this_month visit_count_last_month
123 APRIL 2017 1 0
123 MAY 2017 4 1
123 JUNE 2017 2 4
I was able to calculate visit_count_this_month using the following query
SELECT v.user_id, v.month, v.year,
SUM(is_visit_this_month) as visit_count_this_month
FROM
(SELECT user_id, date, month, year,
CASE WHEN TO_CHAR(date, 'MM/YYYY') = TO_CHAR(date, 'MM/YYYY')
THEN 1 ELSE 0
END as is_visit_this_month
FROM visits
GROUP BY user_id, date, month, year
HAVING user_id = 123) v
GROUP BY v.user_id, v.month, v.year
However, I'm stuck with calculating visit_count_last_month. Similar to this, I also want to calculate visit_count_last_2months.
Can somebody help?
You can use a LATERAL JOIN like this:
SELECT user_id, month, year, COUNT(*) as visit_count_this_month, visit_count_last_month
FROM visits v
CROSS JOIN LATERAL (
SELECT COUNT(*) as visit_count_last_month
FROM visits
WHERE user_id = v.user_id
AND date = (CAST(v.date AS date) - interval '1 month')
) l
GROUP BY user_id, month, year, visit_count_last_month;
SQLFiddle - http://sqlfiddle.com/#!15/393c8/2
Assuming there are values for every month, you can get the counts per month first and use lag to get the previous month's values per user.
SELECT T.*
,COALESCE(LAG(visits,1) OVER(PARTITION BY USER_ID ORDER BY year,mth),0) as last_month_visits
,COALESCE(LAG(visits,2) OVER(PARTITION BY USER_ID ORDER BY year,mth),0) as last_2_month_visits
FROM (
SELECT user_id, extract(month from date) as mth, year, COUNT(*) as visits
FROM visits
GROUP BY user_id, extract(month from date), year
) T
If there can be missing months, it is best to generate all months within a specified timeframe and left join ing the table on to that. (This example shows it for all the months in 2017).
select user_id,yr,mth,visits
,coalesce(lag(visits,1) over(PARTITION BY USER_ID ORDER BY yr,mth),0) as last_month_visits
,coalesce(lag(visits,2) OVER(PARTITION BY USER_ID ORDER BY yr,mth),0) as last_2_month_visits
from (select u.user_id,extract(year from d.dt) as yr, extract(month from d.dt) as mth,count(v.visiting_id) as visits
from generate_series(date '2017-01-01', date '2017-12-31',interval '1 month') d(dt)
cross join (select distinct user_id from visits) u
left join visits v on extract(month from v.dt)=extract(month from d.dt) and extract(year from v.dt)=extract(year from d.dt) and u.user_id=v.user_id
group by u.user_id,extract(year from d.dt), extract(month from d.dt)
) t

PostgreSQL group by and order by

I have a table with a date column. I wanted to get the count of months and display them in the order of months. Months should be displayed as 'Jan', 'Feb' etc. If I use to_char function, the order by happens on text. I can use extract(month from dt), but that will also display month in number format. This is part of a report and month should be displayed in 'Mon' format only.
SELECT to_char(dt,'Mon'), COUNT(*) FROM tb GROUP BY to_char(dt,'Mon') ORDER BY to_char(dt,'Mon');
to_char | count
---------+-------
Dec | 1
Jan | 1
Jul | 2
select month, total
from (
select
extract(month from dt) as month_number,
to_char(dt,'mon') as month,
count(*) as total
from tb
group by 1, 2
) s
order by month_number