Monthly Total Count from Multiple Tables PostgreSQL - sql

I have a search set up which gives total count of new patient visits and total count of patient visits, and comparing the totals for the requested year to the previous year's totals.
The SQL queries the date fields firstexam and lastexam from the table patient_info.
I have since found out that some users do not update the lastexam with every patient visit, and therefore the lastexam would not give the total number of patient visits.
Total number of patient visits can be obtained by searching the transactions table. Invoices in the transaction table are marked with the column transtype as 'Inv'. So, the total number of patient visits would be the total number of invoices in the date range (taking into account that two invoices entered for a patient in a single day count as one visit).
Below is the code for the SQL query set up based on firstexam and lastexam.
I have been struggling with this for some time now and am stuck. Any help would be greatly appreciated.
select
to_char(('2012-' || m || '-01')::date, 'Month'),
thisyear, lastyear, totalthisyear, totallastyear
from (
select
extract(month from m) as m,
sum(case
when firstexam between '2013-01-01' and '2013-12-31' then firstexam_count
else 0 end
) as thisyear,
sum(case
when firstexam between '2012-01-01' and '2012-12-31' then firstexam_count
else 0 end
) as lastyear,
sum(case
when lastexam between '2013-01-01' and '2013-12-31' then lastexam_count
else 0 end
) as totalthisyear,
sum(case
when lastexam between '2012-01-01' and '2012-12-31' then lastexam_count
else 0 end
) as totallastyear
from
generate_series (
'2012-01-01'::date, '2013-12-31', '1 month'
) g(m)
left join (
select count(*) as firstexam_count, date_trunc('month', firstexam) as firstexam
from patient_info
where firstexam between '2012-01-01' and '2013-12-31'
group by 2
) pif on firstexam = m
left join (
select count(*) as lastexam_count, date_trunc('month', lastexam) as lastexam
from patient_info
where lastexam between '2012-01-01' and '2013-12-31'
group by 2
) pil on lastexam = m
group by 1
) s
order by m

If you want to report information about exams, you ought to store information about exams.
More specifically, if you want to count exams, you ought to store information about each exam.
Don't use column names like "thisyear" and "lastyear". This year isn't 2013, although that's how you present it.
Usually, visits and exams are different things. Be careful with terminology. (Here it's not such a big deal, because we don't have information about either visits or exams. Only about invoices. Still, it's a good habit.)
If you're concerned about a particular output format, ask yourself whether you're building a query or a report. Build queries in SQL. Build reports with a report writer or application code.
For simplicity, I'm going to
ignore the "patient_info" table,
ignore the outer join you need in order to generate zeroes for months in which there were no exams, and
use common table expressions. (In production I'd rather use views than common table expressions).
Let's start with just a table of transactions.
create table transactions (
ptnumber INT,
dateofservice date,
transtype varchar(3)
);
-- Not quite the same data you started with.
insert into transactions (ptnumber, dateofservice, transtype)
values
(1, '2012-01-01', 'Inv'),
(1, '2012-02-11', 'Inv'),
(2, '2012-01-02', 'Inv'),
(3, '2013-01-01', 'Inv'),
(4, '2013-02-12', 'Inv'),
(5, '2012-12-31', 'Inv'),
(5, '2013-12-31', 'Inv'),
(5, '2013-12-31', 'Inv'),
(6, '2013-06-21', 'Inv');
You said "two invoices entered for a patient in a single day count as one [exam]". I guess that means two or more. So we can extract the set of patient exams like this. I expect two rows for patient 5--one in 2012 and one in 2013.
select distinct ptnumber, dateofservice
from transactions
where transtype = 'Inv'
and dateofservice between '2012-01-01' and '2013-12-31'
order by ptnumber;
ptnumber dateofservice
--
1 2012-01-01
1 2012-02-11
2 2012-01-02
3 2013-01-01
4 2013-02-12
5 2012-12-31
5 2013-12-31
6 2013-06-21
This is the key to your whole problem--a set of distinct patient exams over a defined range of dates. Based on this set, counting patient visits by month is straightforward. (Counting them every which way is straightforward.)
with patient_exams as (
select distinct ptnumber, dateofservice
from transactions
where transtype = 'Inv'
and dateofservice between '2012-01-01' and '2013-12-31'
)
select to_char(dateofservice, 'YYYY-MM') as month_of_service, count(*) as num_patient_exams
from patient_exams
group by 1
order by 1;
month_of_service num_patient_visits
--
2012-01 2
2012-02 1
2012-12 1
2013-01 1
2013-02 1
2013-06 1
2013-12 1
First exams
Again, start by deriving a set that will give you reliable counts. You want one row per patient, and you want the earliest invoice date. The date of a patient's first exam has nothing to do with the date range you want to report; including the date range in this query's WHERE clause will give you the wrong data.
select ptnumber, min(dateofservice) as first_exam_date
from transactions
where transtype = 'Inv'
group by ptnumber
order by ptnumber;
ptnumber first_exam_date
--
1 2012-01-01
2 2012-01-02
3 2013-01-01
4 2013-02-12
5 2012-12-31
6 2013-06-21
Now counting how many new patients you gained each month is straightforward.
with first_exams as (
select ptnumber, min(dateofservice) as first_exam_date
from transactions
where transtype = 'Inv'
group by ptnumber
)
select to_char(first_exam_date, 'YYYY-MM') exam_month, count(*) num_first_exams
from first_exams
where first_exam_date between '2012-01-01' and '2013-12-31'
group by 1
order by 1;
exam_month num_first_exams
--
2012-01 2
2012-12 1
2013-01 1
2013-02 1
2013-06 1

Related

Retrieve Customers with a Monthly Order Frequency greater than 4

I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.
I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.
So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!
An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)

Weekly active or lapsing status in BigQuery

I want to see the status of a customer each week based on their activity.
If a customer has transacted in the last 7 days it should appear as active and if the customer has not transacted in 8-21 days it should appear as "lapsing".
I have these values in my table:
enter image description here
Desired output refrence:
Week# Customer_id Status
If you want a row for every combination of week and customer_id, you could create a large cross join of the distinct combinations of those two from your orders table, then match all orders back into that superset keeping the latest (that's before that date).
with base_table as (
select distinct customer_id, week_date
from orders
cross join (SELECT week_date FROM UNNEST(GENERATE_DATE_ARRAY((select min(order_date) from orders), CURRENT_DATE(), INTERVAL 7 DAY)) AS week_date)
)
select base_table.customer_id, base_table.week_date, max(order_date) as latest_order,
case
when DATE_DIFF(week_date,max(order_date),DAY) <= 7 then 'active'
when DATE_DIFF(week_date,max(order_date),DAY) >= 8 and DATE_DIFF(week_date,max(order_date),DAY) <= 21 then 'lapsing'
else 'not active'
end as status
from base_table
cross join orders
where orders.customer_id = base_table.customer_id
and order_date <= week_date
group by 1, 2

How to count only the working days between two dates?

I have the following table called vacations, where the employee number is displayed along with the start and end date of their vacations:
id_employe
start
end
1001
2020-12-24
2021-01-04
What I am looking for is to visualize the amount of vacation days that each employee had, but separating them by employee number, month, year and number of days; without taking into account non-business days (Saturdays, Sundays and holidays).
I have the following query, which manages to omit Saturday and Sunday from the posting:
SELECT id_employee,
EXTRACT(YEAR FROM t.Date) AS year,
EXTRACT(MONTH FROM t.Date) AS month,
SUM(WEEKDAY(`Date`) < 5) AS days
FROM (SELECT v.id_employee,
DATE_ADD(v.start, interval s.seq - 1 DAY) AS Date
FROM vacations v CROSS JOIN seq_1_to_100 s
WHERE DATE_ADD(v.start, interval s.seq - 1 DAY) <= v.end
ORDER BY v.id_employee, v.start, s.seq ) t
GROUP BY id_employee, EXTRACT(YEAR_MONTH FROM t.Date);
My question is, how could I in addition to skipping the weekends, also skip the holidays? I suppose that I should establish another table where the dates of those holidays are stored, but how could my * query * be adapted to perform the comparison?
If we consider that the employee 1001 took his vacations from 2020-12-24 to 2021-01-04 and we take Christmas and New Years as holidays, we should get the following result:
id_employee
month
year
days
1001
12
2020
5
1001
1
2021
1
After you have created a table that stores the holiday dates, then you probably can do something like this:
SELECT id_employee,
EXTRACT(YEAR FROM t.Date) AS year,
EXTRACT(MONTH FROM t.Date) AS month,
SUM(CASE WHEN h.holiday_date IS NULL THEN WEEKDAY(`Date`) < 5 END) AS days
FROM (SELECT v.id_employee,
DATE_ADD(v.start, interval s.seq - 1 DAY) AS Date
FROM vacations v CROSS JOIN seq_1_to_100 s
WHERE DATE_ADD(v.start, interval s.seq - 1 DAY) <= v.end
ORDER BY v.id_employee, v.start, s.seq ) t
LEFT JOIN holidays h ON t.date=h.holiday_date
GROUP BY id_employee, EXTRACT(YEAR_MONTH FROM t.Date);
Assuming that the holidays table structure would be something like this:
CREATE TABLE holidays (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
holiday_date DATE,
holiday_description VARCHAR(255));
Then LEFT JOIN it to your current query and change the SUM() slightly by adding CASE expression to check. If the ON t.date=h.holiday_date in the left join matches, there will be result of field h.holiday_date, otherwise it will be NULL, hence only the CASE h.holiday_date WHEN IS NULL .. will be considered.
Demo fiddle
Adding this solution compatible with both MariaDB and MySQL version that supports common table expression:
WITH RECURSIVE cte AS
(SELECT id_employee, start, start lvdt, end FROM vacations
UNION ALL
SELECT id_employee, start, lvdt+INTERVAL 1 DAY, end FROM cte
WHERE lvdt+INTERVAL 1 DAY <=end)
SELECT id_employee,
YEAR(v.lvdt) AS year,
MONTH(v.lvdt) AS month,
SUM(CASE WHEN h.holiday_date IS NULL THEN WEEKDAY(v.lvdt) < 5 END) AS days
FROM cte v
LEFT JOIN holidays h
ON v.lvdt=h.holiday_date
GROUP BY id_employee,
YEAR(v.lvdt),
MONTH(v.lvdt);

SQL Comparison Query Error

I have a table with transaction history for 3 years, I need to compare the sum ( transaction) for 12 months with sum( transaction) for 4 weeks and display the customer list with the result set.
Table Transaction_History
Customer_List Transaction Date
1 200 01/01/2014
2 200 01/01/2014
1 100 10/24/2014
1 100 11/01/2014
2 200 11/01/2014
The output should have only Customer_List with 1 because sum of 12 months transactions equals sum of 1 month transaction.
I am confused about how to find the sum for 12 months and then compare with same table sum for 4 weeks.
the query below will work, except your sample data doesnt make sense
total for customer 1 for the last 12 months in your data set = 400
total for customer 1 for the last 4 weeks in your data set = 200
unless you want to exclude the last 4 weeks, and not be a part of the last 12 months?
then you would change the "having clause" to:
having
sum(case when Dt >= '01/01/2014' and dt <='12/31/2014' then (trans) end) - sum(case when Dt >= '10/01/2014' and dt <= '11/02/2014' then (trans) end) =
sum(case when Dt >= '10/01/2014' and dt <= '11/02/2014' then (trans) end)
of course doing this would mean your results would be customer 1 and 2
create table #trans_hist
(Customer_List int, Trans int, Dt Date)
insert into #trans_hist (Customer_List, Trans , Dt ) values
(1, 200, '01/01/2014'),
(2, 200, '01/01/2014'),
(1, 100, '10/24/2014'),
(1, 100, '11/01/2014'),
(2, 200, '11/01/2014')
select
Customer_List
from #trans_hist
group by
Customer_List
having
sum(case when Dt >= '01/01/2014' and dt <='12/31/2014' then (trans) end) =
sum(case when Dt >= '10/01/2014' and dt <= '11/02/2014' then (trans) end)
drop table #trans_hist
I suggest a self join.
select yourfields
from yourtable twelvemonths join yourtable fourweeks on something
where fourweek.something is within a four week period
and twelvemonths.something is within a 12 month period
You should be able to work out the details.
If your transactions are always positive and you want customers whose 12-month totals equal the 4-week total, then you want customers who have transactions in the past four weeks but not in the preceding 12 months - 4 weeks.
You can get this more directly using aggregation and a having clause. The logic is to check for any transactions in the past year that occurred before the previous 4 weeks:
select Customer_List
from Transaction_History
where date >= dateadd(month, -12, getdate())
group by CustomerList
having min(date) >= dateadd(day, -4 * 7, getdate());
Look here for methods to aggregate by month, year, etc.
http://weblogs.sqlteam.com/jeffs/archive/2007/09/10/group-by-month-sql.aspx

How to calculate retention month over month using SQL

Trying to get a basic table that shows retention from one month to the next. So if someone buys something last month and they do so the next month it gets counted.
month, num_transactions, repeat_transactions, retention
2012-02, 5, 2, 40%
2012-03, 10, 3, 30%
2012-04, 15, 8, 53%
So if everyone that bought last month bought again the following month you have 100%.
So far I can only calculate stuff manually. This gives me the rows that have been seen in both months:
select count(*) as num_repeat_buyers from
(select distinct
to_char(transaction.timestamp, 'YYYY-MM') as month,
auth_user.email
from
auth_user,
transaction
where
auth_user.id = transaction.buyer_id and
to_char(transaction.timestamp, 'YYYY-MM') = '2012-03'
) as table1,
(select distinct
to_char(transaction.timestamp, 'YYYY-MM') as month,
auth_user.email
from
auth_user,
transaction
where
auth_user.id = transaction.buyer_id and
to_char(transaction.timestamp, 'YYYY-MM') = '2012-04'
) as table2
where table1.email = table2.email
This is not right but I feel like I can use some of Postgres' windowing functions. Keep in mind the windowing functions don't let you specify WHERE clauses. You mostly have access to the previous rows and the preceding rows:
select month, count(*) as num_transactions, count(*) over (PARTITION BY month ORDER BY month)
from
(select distinct
to_char(transaction.timestamp, 'YYYY-MM') as month,
auth_user.email
from
auth_user,
transaction
where
auth_user.id = transaction.buyer_id
order by
month
) as transactions_by_month
group by
month
Given the following test table (which you should have provided):
CREATE TEMP TABLE transaction (buyer_id int, tstamp timestamp);
INSERT INTO transaction VALUES
(1,'2012-01-03 20:00')
,(1,'2012-01-05 20:00')
,(1,'2012-01-07 20:00') -- multiple transactions this month
,(1,'2012-02-03 20:00') -- next month
,(1,'2012-03-05 20:00') -- next month
,(2,'2012-01-07 20:00')
,(2,'2012-03-07 20:00') -- not next month
,(3,'2012-01-07 20:00') -- just once
,(4,'2012-02-07 20:00'); -- just once
Table auth_user is not relevant to the problem.
Using tstamp as column name since I don't use base types as identifiers.
I am going to use the window function lag() to identify repeated buyers. To keep it short I combine aggregate and window functions in one query level. Bear in mind that window functions are applied after aggregate functions.
WITH t AS (
SELECT buyer_id
,date_trunc('month', tstamp) AS month
,count(*) AS item_transactions
,lag(date_trunc('month', tstamp)) OVER (PARTITION BY buyer_id
ORDER BY date_trunc('month', tstamp))
= date_trunc('month', tstamp) - interval '1 month'
OR NULL AS repeat_transaction
FROM transaction
WHERE tstamp >= '2012-01-01'::date
AND tstamp < '2012-05-01'::date -- time range of interest.
GROUP BY 1, 2
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
Result:
month | num_trans | num_buyers | repeat_buyers | buyer_retention_pct
---------+-----------+------------+---------------+--------------------
2012-01 | 5 | 3 | 0 | 0.00
2012-02 | 2 | 2 | 1 | 50.00
2012-03 | 2 | 2 | 1 | 50.00
I extended your question to provide for the difference between the number of transactions and the number of buyers.
The OR NULL for repeat_transaction serves to convert FALSE to NULL, so those values do not get counted by count() in the next step.
-> SQLfiddle.
This uses CASE and EXISTS to get repeated transactions:
SELECT
*,
CASE
WHEN num_transactions = 0
THEN 0
ELSE round(100.0 * repeat_transactions / num_transactions, 2)
END AS retention
FROM
(
SELECT
to_char(timestamp, 'YYYY-MM') AS month,
count(*) AS num_transactions,
sum(CASE
WHEN EXISTS (
SELECT 1
FROM transaction AS t
JOIN auth_user AS u
ON t.buyer_id = u.id
WHERE
date_trunc('month', transaction.timestamp)
+ interval '1 month'
= date_trunc('month', t.timestamp)
AND auth_user.email = u.email
)
THEN 1
ELSE 0
END) AS repeat_transactions
FROM
transaction
JOIN auth_user
ON transaction.buyer_id = auth_user.id
GROUP BY 1
) AS summary
ORDER BY 1;
EDIT: Changed from minus 1 month to plus 1 month after reading the question again. My understanding now is that if someone buy something in 2012-02, and then buy something again in 2012-03, then his or her transactions in 2012-02 are counted as retention for the month.