Month statistics - sql

I have a table in Postgres and MySQL with a 'created_at' column. I would like to query it for the following:
Month Count
1 0
2 0
3 0
4 12
5 15
...
Can anyone cough up some sql? Notice that the months with no rows returned must be listed as 0's. I have this:
SELECT month(created_at) as month, count(*) as c
FROM `sale_registrations`
WHERE (created_at>='2011-01-01' and created_at<='2011-12-31')
GROUP BY month(created_at)
ORDER BY month(created_at)

Use EXTRACT(month FROM created_at) to get the month. This works in MySQL as well.
Edit: Use a RIGHT JOIN on a table with the month numbers:
CREATE TABLE months(nr tinyint);
INSERT INTO months(nr) VALUES (1),(3),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12);
SELECT
nr as month,
COUNT(*) as c
FROM
sale_registrations
RIGHT JOIN months ON EXTRACT(month FROM created_at) = nr
WHERE
(created_at BETWEEN '2011-01-01' AND '2011-12-31')
GROUP BY
EXTRACT(month FROM created_at)
ORDER BY
EXTRACT(month FROM created_at) ASC;
In PostgreSQL you could use generate_series(), but that's not going to work in MySQL.

Related

How to group by week and distinct by day in postgresql

Sample contents are:
id
created_dt
data
1
2023-01-14 11:52:41
{"customers": 1, "payments: 2}
2
2023-01-15 11:53:43
{"customers": 1, "payments: 2}
3
2023-01-18 11:51:45
{"customers": 1, "payments: 2}
4
2023-01-15 11:50:48
{"customers": 1, "payments: 2}
ID 4 or 2 should be distinct.
I want to get a result as follows:
year
week
customers
payments
2023
2
2
4
2023
3
1
2
I solved this problem in this way
SELECT
date_part('year', sq.created_dt) AS year,
date_part('week', sq.created_dt) AS week,
sum((sq.data->'customers')::int) AS customers,
sum((sq.data->'payments')::int) AS payments
FROM
(SELECT DISTINCT ON (created_dt::date) created_dt, data
FROM analytics) sq
GROUP BY
year, week
ORDER BY
year, week;
However, that subquery greatly complicates the query. Is there is a better method?
I need group the data by each week, however I also need to remove duplicate days.
Generate series to create the join table would solve the problem :
SELECT sum((sq.data->'customers')::int) as customers,
sum((sq.data->'payments')::int) as payments,
date_part('year', dategroup ) as year,
date_part('week', dategroup ) as week,
FROM generate_series(current_date , current_date+interval '1 month' , interval'1 week') AS dategroup
JOIN analytics AS a ON a.created_dt >= dategroup AND a.created_dt <= a.created_dt+interval '1 week'
GROUP BY dategroup
ORDER BY dategroup
First of all, I think your query is quite simple and understandable.
Here is the query with a with-query in it, in some point it adds more readabilty:
WITH unique_days_data AS (
SELECT DISTINCT created_dt::date, data_json
FROM analytics)
SELECT
date_part('year', ud.created_dt) as year,
date_part('week', ud.created_dt) as week,
sum((ud.data_json->'customers')::int) as customers,
sum((ud.data_json->'payments')::int) as payments
FROM unique_days_data ud
GROUP BY year, week
ORDER BY year, week;
The difference is that the first query uses the DISTINCT clause, not the DISTINCT ON clause.
Here is the sql fiddle.
You can simplify it by adding partitioning on "created_id::date", then filter last aggregated record for each week using FETCH FIRST n ROWS WITH TIES.
SELECT date_part('year', created_dt) AS year,
date_part('week', created_dt) AS week,
SUM((data->>'customers')::int) AS customers,
SUM((data->>'payments')::int) AS payments
FROM analytics
GROUP BY year, week, created_dt::date
ORDER BY ROW_NUMBER() OVER(
PARTITION BY date_part('week', created_dt)
ORDER BY created_dt::date DESC
)
FETCH FIRST 1 ROWS WITH TIES
Check the demo here.

Select and Count Multiple Group By SQL

Can someone tell me how to do this in Database?
I've tried some sql like:
SELECT disastertype, YEAR(eventdate) as year,
COUNT(disastertype) AS disastertype_total
FROM v_disasterlogs_all
WHERE YEAR(eventdate) >= year(CURRENT_TIMESTAMP) - 4
GROUP BY YEAR(eventdate)
ORDER BY YEAR(eventdate) ASC
But, it only shows like this:
include disastertype on our group by statement.
SELECT disastertype, YEAR(eventdate) as year,
COUNT(disastertype) AS disastertype_total
FROM v_disasterlogs_all
WHERE YEAR(eventdate) >= year(CURRENT_TIMESTAMP) - 4
GROUP BY YEAR(eventdate), disastertype
ORDER BY YEAR(eventdate) ASC
I am assuming you want a count (the column index) to be associated with each unique year?
In this case, a possible solution in postgres will be as below.
select
dense_rank() over (order by date_part('year', (eventdate))) as index ,
date_part('year', (eventdate)) as year,
disastertype,
count(disastertype)
from
v_disaterlogs_all
where
date_part('year', (eventdate)) >= date_part('year', now()) - 4
group by
year,
disastertype
order by
year asc;
In postgres, I have used the function date_part to extract the year from the timestamp.
Working solution on dbfiddle.

Retrieve Customers with a Monthly Order Frequency greater than 4

I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.
I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.
So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!
An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)

How to get number of billable customers per month SQL

This is what my table looks like:
NOTE: Don't worry about the BMI field being empty in some rows. We assume that each row is a reading. I have omitted some columns for privacy reasons.
I want to get a count of the number of active customers per month. A customer is active if they have at least 18 readings in total (1 reading per day for 18 days in a given month). How do I write this SQL query? Assume the table name is 'cust'. I'm using SQL Server. Any help is appreciated.
Presumably a patient is a customer in your world. If so, you can use two levels of aggregation:
select yyyy, mm, count(*)
from (select year(createdat) as yyyy, month(createdat) as mm,
patient_id,
count(distinct convert(date, createdat)) as num_days
from t
group by year(createdat), month(createdat), patient_id
) ymp
where num_days >= 18
group by yyyy, mm;
You need to group by patient and the month, then group again by just the month
SELECT
mth,
COUNT(*) NumPatients
FROM (
SELECT
EOMONTH(c.createdat) mth
FROM cust c
GROUP BY EOMONTH(c.createdat), c.patient_id
HAVING COUNT(*) >= 18
-- for distinct days you could change it to:
-- HAVING COUNT(DISTINCT CAST(c.createdat AS date)) >= 18
) c
GROUP BY mth;

Frequency distinct values grouped by date

I am trying to get the frequency of unique ID values for each month of the last year. However, I don't get the outcome.. including the error message "SELECT list expression references column user_id which is neither grouped nor aggregated".
How can I get the count of unique IDs in each month and them group them by month?
What I tried:
SELECT
user_id,
EXTRACT(MONTH FROM date) as month
FROM
TABLE
WHERE
date >= '2020-09-01'
GROUP BY
month
I want something like this:
month
count of unique user_id
1
300
2
200
...
...
12
250
You would use GROUP BY and COUNT(DISTINCT):
SELECT EXTRACT(MONTH FROM date) as month, COUNT(DISTINCT user_id)
FROM TABLE
WHERE date >= '2020-09-01'
GROUP BY 1;
I would advise you to include the year in the query. In BigQuery, this is simplest using DATE_TRUNC():
SELECT DATE_TRUNC(date, MONTH) as month, COUNT(DISTINCT user_id)
FROM TABLE
WHERE date >= '2020-09-01'
GROUP BY 1;