Total brokers with atleast one purchase weekly in postgres - sql

cid
Company name
1
cname 1
2
cname2
bid
cid
broker name
1
2
broker 1
2
1
broker 2
pid
bid
purchase date
1
1
2021-05-01 00:20:30
2
2
2021-05-02 13:20:30
I have above tables. I would like to fetch data weekly data of brokers with at least one purchase in a week.
week start date
No of brokers
2021-04-03 00:00:00
5
2021-04-10 00:00:00
20
Also I would like to fetch data weekly data of companies with at least one purchase in a week.
week start date
No of companies
2021-04-03 00:00:00
5
2021-04-10 00:00:00
20
postgres sql queries.

You can use date_trunc() and then count the brokers and companies using count(distinct):
select date_trunc('week', purchase_date) as week,
count(distinct p.bid) as num_brokers,
count(distinct b.cid) as num_companies
from purchases p join
brokers b
on p.bid = b.id
group by week;
You would use count(*) to get the number of purchases.

Related

How to calculate average monthly number of some action in some perdion in Teradata SQL?

I have table in Teradata SQL like below:
ID trans_date
------------------------
123 | 2021-01-01
887 | 2021-01-15
123 | 2021-02-10
45 | 2021-03-11
789 | 2021-10-01
45 | 2021-09-02
And I need to calculate average monthly number of transactions made by customers in a period between 2021-01-01 and 2021-09-01, so client with "ID" = 789 will not be calculated because he made transaction later.
In the first month (01) were 2 transactions
In the second month was 1 transaction
In the third month was 1 transaction
In the nineth month was 1 transactions
So the result should be (2+1+1+1) / 4 = 1.25, isn't is ?
How can I calculate it in Teradata SQL? Of course I showed you sample of my data.
SELECT ID, AVG(txns) FROM
(SELECT ID, TRUNC(trans_date,'MON') as mth, COUNT(*) as txns
FROM mytable
-- WHERE condition matches the question but likely want to
-- use end date 2021-09-30 or use mth instead of trans_date
WHERE trans_date BETWEEN date'2021-01-01' and date'2021-09-01'
GROUP BY id, mth) mth_txn
GROUP BY id;
Your logic translated to SQL:
--(2+1+1+1) / 4
SELECT id, COUNT(*) / COUNT(DISTINCT TRUNC(trans_date,'MON')) AS avg_tx
FROM mytable
WHERE trans_date BETWEEN date'2021-01-01' and date'2021-09-01'
GROUP BY id;
You should compare to Fred's answer to see which is more efficent on your data.

how to calculate occupancy on the basis of admission and discharge dates

Suppose I have patient admission/claim wise data like the sample below. Data type of patient_id and hosp_id columns is VARCHAR
Table name claims
rec_no
patient_id
hosp_id
admn_date
discharge_date
1
1
1
01-01-2020
10-01-2020
2
2
1
31-12-2019
11-01-2020
3
1
1
11-01-2020
15-01-2020
4
3
1
04-01-2020
10-01-2020
5
1
2
16-01-2020
17-01-2020
6
4
2
01-01-2020
10-01-2020
7
5
2
02-01-2020
11-01-2020
8
6
2
03-01-2020
12-01-2020
9
7
2
04-01-2020
13-01-2020
10
2
1
31-12-2019
10-01-2020
I have another table wherein bed strength/max occupancy strength of hospitals are stored.
table name beds
hosp_id
bed_strength
1
3
2
4
Expected Results I want to find out hospital-wise dates where its declared bed-strength has exceeded on any day.
Code I have tried Nothing as I am new to SQL. However, I can solve this in R with the following strategy
pivot_longer the dates
tidyr::complete() missing dates in between
summarise or aggregate results for each date.
Simultaneously, I also want to know that whether it can be done without pivoting (if any) in sql because in the claims table there are 15 million + rows and pivoting really really slows down the process. Please help.
You can use generate_series() to do something very similar in Postgres. For the occupancy by date:
select c.hosp_id, gs.date, count(*) as occupanyc
from claims c cross join lateral
generate_series(admn_date, discharge_date, interval '1 day') gs(date)
group by c.hosp_id, gs.date;
Then use this as a subquery to get the dates that exceed the threshold:
select hd.*, b.strength
from (select c.hosp_id, gs.date, count(*) as occupancy
from claims c cross join lateral
generate_series(c.admn_date, c.discharge_date, interval '1 day') gs(date)
group by c.hosp_id, gs.date
) hd join
beds b
using (hosp_id)
where h.occupancy > b.strength

Counting subscriber numbers given events on SQL

I have a dataset on mysql in the following format, showing the history of events given some client IDs:
Base Data
Text of the dataset (subscriber_table):
user_id type created_at
A past_due 2021-03-27 10:15:56
A reactivate 2021-02-06 10:21:35
A past_due 2021-01-27 10:30:41
A new 2020-10-28 18:53:07
A cancel 2020-07-22 9:48:54
A reactivate 2020-07-22 9:48:53
A cancel 2020-07-15 2:53:05
A new 2020-06-20 20:24:18
B reactivate 2020-06-14 10:57:50
B past_due 2020-06-14 10:33:21
B new 2020-06-11 10:21:24
date_table:
full_date
2020-05-01
2020-06-01
2020-07-01
2020-08-01
2020-09-01
2020-10-01
2020-11-01
2020-12-01
2021-01-01
2021-02-01
2021-03-01
I have been struggling to come up with a query to count subscriber counts given a range of months, which are not necessary included in the event table either because the client is still subscribed or they cancelled and later resubscribed. The output I am looking for is this:
Output
date subscriber_count
2020-05-01 0
2020-06-01 2
2020-07-01 2
2020-08-01 1
2020-09-01 1
2020-10-01 2
2020-11-01 2
2020-12-01 2
2021-01-01 2
2021-02-01 2
2021-03-01 2
Reactivation and Past Due events do not change the subscription status of the client, however only the Cancel and New event do. If the client cancels in a month, they should still be counted as active for that month.
My initial approach was to get the latest entry given a month per subscriber ID and then join them to the premade date table, but when I have months missing I am unsure on how to fill them with the correct status. Maybe a lag function?
with last_record_per_month as (
select
date_trunc('month', created_at)::date order by created_at) as month_year ,
user_id ,
type,
created_at as created_at
from
subscriber_table
where
user_id in ('A', 'B')
order by
created_at desc
), final as (
select
month_year,
created_at,
type
from
last_record_per_month lrpm
right join (
select
date_trunc('month', full_date)::date as month_year
from
date_table
where
full_date between '2020-05-01' and '2021-03-31'
group by
1
order by
1
) dd
on lrpm.created_at = dd.month_year
and num = 1
order by
month_year
)
select
*
from
final
I do have a premade base table with every single date in many years to use as a joining table
Any help with this is GREATLY appreciated
Thanks!
The approach here is to have the subscriber rows with new connections as base and map them to the cancelled rows using a self join. Then have the date tables as base and aggregate them based on the number of users to get the result.
SELECT full_date, COUNT(DISTINCT user_id) FROM date_tbl
LEFT JOIN(
SELECT new.user_id,new.type,new.created_at created_at_new,
IFNULL(cancel.created_at,CURRENT_DATE) created_at_cancel
FROM subscriber new
LEFT JOIN subscriber cancel
ON new.user_id=cancel.user_id
AND new.type='new' AND cancel.type='cancel'
AND new.created_at<= cancel.created_at
WHERE new.type IN('new'))s
ON DATE_FORMAT(s.created_at_new, '%Y-%m')<=DATE_FORMAT(full_date, '%Y-%m')
AND DATE_FORMAT(s.created_at_cancel, '%Y-%m')>=DATE_FORMAT(full_date, '%Y-%m')
GROUP BY 1
Let me breakdown some sections
First up we need to have the subscriber table self joined based on user_id and then left table with rows as 'new' and the right one with 'cancel' new.type='new' AND cancel.type='cancel'
The new ones should always precede the canceled rows so adding this new.created_at<= cancel.created_at
Since we only care about the rows with new in the base table we filter out the rows in the WHERE clause new.type IN('new'). The result of the subquery would look something like this
We can then join this subquery with a Left join the date table such that the year and month of the created_at_new column is always less than equal to the full_date DATE_FORMAT(s.created_at_new, '%Y-%m')<=DATE_FORMAT(full_date, '%Y-%m') but greater than that of the canceled date.
Lastly we aggregate based on the full_date and consider the unique count of users
fiddle

group by of one column and having count of another

I have a table 'customer' which contains 4 columns
name day product price
A 2021-04-01 p1 100
B 2021-04-01 p1 100
C 2021-04-01 p2 120
A 2021-04-01 p2 120
A 2021-04-02 p1 100
B 2021-04-02 p3 80
C 2021-04-03 p2 120
D 2021-04-03 p2 120
C 2021-04-04 p1 100
With a command
SELECT COUNT(name)
FROM (SELECT name
FROM customer
WHERE day > '2021-03-28'
AND day < '2021-04-09'
GROUP BY name
HAVING COUNT(name) > 2)
I could count number of customer that bought something more than twice in a period of time.
I would like to know in each day (GROUP BY over day) how many customers bought something with this condition that in a period they bought something more than twice.
Suggested Edit:
For above example A and C are valid agents by the condition.
The desired output will be:
day how_many
2021-04-01 2
2021-04-02 1
2021-04-03 1
2021-04-04 1
I interpret your question as wanting to know how many customers made more than one purchase on each day. If so, one method uses two levels of aggregation:
select day,
sum(case when day_count >= 2 then 1 else 0 end)
from (select c.name, c.day, count(*) as day_count
from customer c
group by c.name, c.day
) nc
group by day
order by day;

Calculate difference in rows recorded at different timestamp in SQL

I have a table with data as follows
Person_ID Date Sale
1 2016-05-08 2686
1 2016-05-09 2688
1 2016-05-14 2689
1 2016-05-18 2691
1 2016-05-24 2693
1 2016-05-25 2694
1 2016-05-27 2695
and there are a million such id's for different people. Sale count is recorded only when a sale increases else it is not. Therefore data for id' 2 can be different from id 1.
Person_ID Date Sale
2 2016-05-10 26
2 2016-05-20 29
2 2016-05-18 30
2 2016-05-22 39
2 2016-05-25 40
Sale count of 29 on 5/20 means he sold 3 products on 20th, and had sold 26 till 5/10 with no sale in between these 2 dates.
Question: I want a sql/dynamic sql to calculate the daily a sales of all the agents and produce a report as follows:
ID Sale_511 Sale_512 Sale_513 -------------- Sale_519 Sale_520
2 0 0 0 --------------- 0 3
(29-26)
Question is how do I use that data to calculate a report. As I do have data between 5/20 to 5/10. SO i can just write a query saying A-B = C?
Can anyone help? Thank you.
P.S - New to SQL so learning.
Using Sql Server 2008.
Most SQL dialects support the lag() function. You can get what you want as:
select person_id, date,
(sale - lag(sale) over (partition by person_id, date)) as Daily_Sales
from t;
This produces one row per date for each person. This format is more typical for how SQL would return such results.
In SQL Server 2008, you can do:
select t.person_id, t.date,
(t.sale - t2.sale) as Daily_Sales
from t outer apply
(select top 1 t2.*
from t t2
where t2.person_id = t.person_id and t2.date < t.date
) t2