This is what my table looks like:
NOTE: Don't worry about the BMI field being empty in some rows. We assume that each row is a reading. I have omitted some columns for privacy reasons.
I want to get a count of the number of active customers per month. A customer is active if they have at least 18 readings in total (1 reading per day for 18 days in a given month). How do I write this SQL query? Assume the table name is 'cust'. I'm using SQL Server. Any help is appreciated.
Presumably a patient is a customer in your world. If so, you can use two levels of aggregation:
select yyyy, mm, count(*)
from (select year(createdat) as yyyy, month(createdat) as mm,
patient_id,
count(distinct convert(date, createdat)) as num_days
from t
group by year(createdat), month(createdat), patient_id
) ymp
where num_days >= 18
group by yyyy, mm;
You need to group by patient and the month, then group again by just the month
SELECT
mth,
COUNT(*) NumPatients
FROM (
SELECT
EOMONTH(c.createdat) mth
FROM cust c
GROUP BY EOMONTH(c.createdat), c.patient_id
HAVING COUNT(*) >= 18
-- for distinct days you could change it to:
-- HAVING COUNT(DISTINCT CAST(c.createdat AS date)) >= 18
) c
GROUP BY mth;
Related
I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.
I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.
So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!
An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)
**Is there a way to count how many strings in a specific column are seen for
Since the value in the column 2 gets repeated sometimes due to the fact that some clients make several transactions in different times (the client can make a transaction in the 1st month then later in the next year).
Is there a way for me to count how many IDs are completely new per month through a group by (never seen before)?
Please let me know if you need more context.
Thanks!
A simple way is two levels of aggregation. The inner level gets the first date for each customer. The outer summarizes by year and month:
select year(min_date), month(min_date), count(*) as num_firsts
from (select customerid, min(date) as min_date
from t
group by customerid
) c
group by year(min_date), month(min_date)
order by year(min_date), month(min_date);
Note that date/time functions depends on the database you are using, so the syntax for getting the year/month from the date may differ in your database.
You can do the following which will assign a rank to each of the transactions which are unique for that particular customer_id (rank 1 therefore will mean that it is the first order for that customer_id)
The above is included in an inline view and the inline view is then queried to give you the month and the count of the customer id for that month ONLY if their rank = 1.
I have tested on Oracle and works as expected.
SELECT DISTINCT
EXTRACT(MONTH FROM date_of_transaction) AS month,
COUNT(customer_id)
FROM
(
SELECT
date_of_transaction,
customer_id,
RANK() OVER(PARTITION BY customer_id
ORDER BY
date_of_transaction ASC
) AS rank
FROM
table_1
)
WHERE
rank = 1
GROUP BY
EXTRACT(MONTH FROM date_of_transaction)
ORDER BY
EXTRACT(MONTH FROM date_of_transaction) ASC;
Firstly you should generate associate every ID with year and month which are completely new then count, while grouping by year and month:
SELECT count(*) as new_customers, extract(year from t1.date) as year,
extract(month from t1.date) as month FROM table t1
WHERE not exists (SELECT 1 FROM table t2 WHERE t1.id==t2.id AND t2.date<t1.date)
GROUP BY year, month;
Your results will contain, new customer count, year and month
I'm trying to pull this through in Postgres 11.8:
SELECT count(distinct e.id) counter_employees,
(SELECT count(distinct id) FROM employees
WHERE date_trunc('month',date_hired) = period AND company = 11
) hires,
FROM employees e
WHERE period IN (SELECT DISTINCT make_date(...) FROM amounts)
I cant figure out how to declare that the period the subquery should check is outside the subquery. Also, the period is not from a table but generated, so there is not a column in amounts to relate to the employees inside the subquery.
employee table:
id date_hired company
431 2020-01-03 11
422 2020-01-02 11
323 2020-02-03 11
amounts table:
payment_period amount company
202001 999 11
202002 999 11
For every payment period in amounts I want to get some data such as employee count and hires of that period:
period count hires
202001 5 1
202002 6 ...
One option uses aggregation and window functions. If you have hires for each month, then you can get the information directly from employees, like so:
select
date_trunc('month', date_hired) month_hired,
sum(count(*)) over(order by date_trunc('month', date_hired)) no_employees,
count(*) hires
from employees
group by date_trunc('month', date_hired)
On the other hand, if there are months without hires, then you could use generate_series() to create the list of months, then bring the employees with a left join, and aggregate:
select
d.month_hired,
sum(count(e.id)) over(order by d.month_hired) no_employees,
count(e.id) hires
from (
select generate_series(
date_trunc('month', min(date_hired)),
date_trunc('month', max(date_hired)),
interval '1' month
) month_hired
from employees
) d
left join employees e
on e.date_hired >= d.month_hired
and e.date_hired < d.month_hired + interval '1' month
group by d.month_hired
We could run another count for every period distilled from amounts, but that's expensive - unless there are only very few?
For more than a few, compute counts per period for the whole employees table, plus a running total. Then LEFT JOIN to it, should be pretty efficient:
SELECT mon AS period, e.mon_hired AS count, e.all_hired AS hires
FROM (
SELECT to_date(payment_period, 'YYYYMM') AS mon
FROM (SELECT DISTINCT payment_period FROM amounts) a0
) a
LEFT JOIN (
SELECT date_trunc('month', date_hired) AS mon
, count(*) AS mon_hired
, sum(count(*)) OVER (ORDER BY date_trunc('month', date_hired)) AS all_hired
FROM employees e
GROUP BY 1
) e USING (mon)
ORDER BY 1;
This assumes we can just count all employees hired so far to get the total number of hires. (Nobody ever gets fired.)
Works just fine as long as there are rows for every period. Else we need to fill in for the gaps. We can compute a complete grid, or default to the latest row in case of a missing month like this:
WITH e AS (
SELECT date_trunc('month', date_hired) AS mon
, count(*) AS mon_hired
, sum(count(*)) OVER (ORDER BY date_trunc('month', date_hired)) AS all_hired
FROM employees e
GROUP BY 1
)
SELECT mon AS period, ae.*
FROM (
SELECT to_date(payment_period, 'YYYYMM') AS mon
FROM (SELECT DISTINCT payment_period FROM amounts) a0
) a
LEFT JOIN LATERAL (
SELECT CASE WHEN e.mon = a.mon THEN e.mon_hired ELSE 0 END AS count -- ①
, e.all_hired AS hires
FROM e
WHERE e.mon <= a.mon
ORDER BY e.mon DESC
LIMIT 1
) ae USING (mon)
ORDER BY 1;
① If nothing changed for the month, we need to fall back to the last month with data. Take the total count from there, but the monthly count is 0.
We can run a window function over an aggregate on the same query level. See:
Group and count events per time intervals, plus running total
Related:
PostgreSQL: running count of rows for a query 'by minute'
Aside: don't omit the AS keyword for a column alias. See:
Date column arithmetic in PostgreSQL query
The below query returns all USERS that have SUM(AMOUNT) > 10 in a given month. It includes Users in a month even if they don't meet the criteria in other months.
But I'd like to transform this query to return all USERS who must meet the criteria SUM(AMOUNT) > 10 every single month (i.e., from the first month in the table to the last one) across the entire data.
Put another way, exclude users who don't meet SUM(AMOUNT) > 10 every single month.
select USERS, to_char(transaction_date, 'YYYY-MM') as month
from Table
GROUP BY USERS, month
HAVING SUM(AMOUNT) > 10;
One approach uses a generated calendar table representing all months in your data set. We can left join this calendar table to your current query, and then aggregate over all months by user:
WITH months AS (
SELECT DISTINCT TO_CHAR(transaction_date, 'YYYY-MM') AS month
FROM yourTable
),
cte AS (
SELECT USERS, TO_CHAR(transaction_date, 'YYYY-MM') AS month
FROM yourTable
GROUP BY USERS, month
HAVING SUM(AMOUNT) > 10
)
SELECT
t.USERS
FROM months m
LEFT JOIN cte t
ON m.month = t.month
GROUP BY
t.USERS
HAVING
COUNT(t.USERS) = (SELECT COUNT(*) FROM months);
The HAVING clause above asserts that the number of months to which a user matches is in fact the total number of months. This would imply that the user meets the sum criteria for every month.
Perhaps you could use a correlated subquery, such as:
select t.*
from (select distinct table.users from table) t
where not exists
(
select to_char(u.transaction_date, 'YYYY-MM') as month
from table u
where u.users = t.users
group by month
having sum(u.amount) <= 10
)
One option would be using sign(amount-10) vs. sign(amount) logic as
SELECT q.users
FROM
(
with tab(users, transaction_date,amount) as
(
select 1,date'2018-11-24',8 union all
select 1,date'2018-11-24',18 union all
select 2,date'2018-10-24',13 union all
select 3,date'2018-11-24',18 union all
select 3,date'2018-10-24',28 union all
select 3,date'2018-09-24', 3 union all
select 4,date'2018-10-24',28
)
SELECT users, to_char(transaction_date, 'YYYY-MM') as month,
sum(sign(amount-10)) as cnt1,
sum(sign(amount)) as cnt2
FROM tab t
GROUP BY users, month
) q
GROUP BY q.users
HAVING sum(q.cnt1) = sum(q.cnt2)
GROUP BY q.users
users
-----
2
4
Rextester Demo
You need to compare the number of months > 10 to the number of months between the min and the max date:
SELECT users, Count(flag) AS months, Min(mth), Max(mth)
FROM
(
SELECT users, date_trunc('month',transaction_date) AS mth,
CASE WHEN Sum(amount) > 10 THEN 1 end AS flag
FROM tab t
GROUP BY users, mth
) AS dt
GROUP BY users
HAVING -- adding the number of months > 10 to the min date and compare to max
Min(mth) + (INTERVAL '1' MONTH * (Count(flag)-1)) = Max(mth)
If missing months don't count it would be a simple count(flag) = count(*)
I'm trying to find the number of employees joined over a calender year, broken down on a monthly basis. So if 15 employees had joined in January, 30 in February and so on, the output I'd like would be
Month | Employees
------|-----------
Jan | 15
Feb | 30
I've come up with a query to fetch it for a particular month
SELECT * FROM (
SELECT COUNT(EMP_NO), EMP_JN_DT
FROM EMP_REG WHERE
EMP_JN_DT between '01-NOV-09' AND '30-NOV-09'
GROUP BY EMP_JN_DT )
ORDER BY 2
How do I extend this for the full calender year?
SELECT Trunc(EMP_JN_DT,'MM') Emp_Jn_Mth,
Count(*)
FROM EMP_REG
WHERE EMP_JN_DT between date '2009-01-01' AND date '2009-12-31'
GROUP BY Trunc(EMP_JN_DT,'MM')
ORDER BY 1;
If you do not have anyone join in a particular month then you'd get no row returned. To over come this you'd have to outerjoin the above to a list of months in the required year.
SELECT to_date(EMP_JN_DT,'MON') "Month", EMP_NO "Employees"
FROM EMP_REG
WHERE EMP_JN_DT between date '2009-01-01' AND date '2009-12-31'
GROUP by "Month"
ORDER BY 1;
http://www.techonthenet.com/oracle/functions/extract.php
There is a function that returns month. What you need to do is just put it in group by
The number of employees in January can be selected in the following way:
SELECT EXTRACT(MONTH FROM HIREDATE) AS MONTH1, COUNT(*)
FROM employee
WHERE EXTRACT(MONTH FROM HIREDATE)=1
GROUP BY EXTRACT(MONTH FROM HIREDATE)