PostgreSQL showing different time periods in a single query - sql

I have a query that will return the ratio of issuances from (issuances from specific network with specific time period / total issuances). so the issuances from specific network with a specific time period divided to total issuances from all networks. Right now it returns the ratios of issuances only from last year (year-to-date I mean), I want to include several time periods in it such as one month ago, 2 month ago etc. LEFT JOIN usually works but I couldn't figure it out for this one. How do I do it?
Here is the query:
SELECT IR1.network,
count(*) / ((select count(*) FROM issuances_extended
where status = 'completed' and
issued_at >= date_trunc('year',current_date)) * 1.) as issuance_ratio_ytd
FROM issuances_extended as IR1 WHERE status = 'completed' and
(issued_at >= date_trunc('year',current_date))
GROUP BY
IR1.network
order by IR1.network

I would break your query into CTEs something like this:
with periods (period_name, period_range) as (
values
('YTD', daterange(date_trunc('year', current_date), null)),
('LY', daterange(date_trunc('year', current_date - 'interval 1 year'),
date_trunc('year', current_date))),
('MTD', daterange(date_trunc('month', current_date - 'interval 1 month'),
date_trunc('month', current_date));
-- Add whatever other intervals you want to see
), period_totals as ( -- Get period totals
select p.period_name, p.period_range, count(*) as total_issuances
from periods p
join issuances_extended i
on i.status = 'completed'
and i.issued_at <# p.period_range
)
select p.period_name, p.period_range,
i.network, count(*) as network_issuances,
1.0 * count(*) / p.total_issuances as issuance_ratio
from period_totals p
join issuances_extended i
on i.status = 'completed'
and i.issued_at <# p.period_range
group by p.period_name, p.period_range, i.network, p.total_issuances;
The problem with this is that you get rows instead of columns, but you can use a spreadsheet program or reporting tool to pivot if you need to. This method simplifies the calculations and lets you add whatever period ranges you want by adding more values to the periods CTE.

Something like this? Obviously not tested
SELECT
IR1.network,
count(*)/((select count(*) FROM issuances_extended
where status = 'completed' and
issued_at between mon.t and current_date ) * 1.) as issuance_ratio_ytd
FROM
issuances_extended as IR1 ,
(
SELECT
generate_series('2022-01-01'::date,
'2022-07-01'::date, '1 month') AS t)
AS mon
WHERE
status = 'completed' and
(issued_at between mon.t and current_date)
GROUP BY
IR1.network
ORDER BY
IR1.network

I've managed to join these tables, so I am answering my question for those who would need some help. To add more tables all you have to do is put new queries in LEFT JOIN and acknowledge them in the base query (IR3, IR4, blabla etc.)
SELECT
IR1.network,
count(*) / (
(
select
count(*)
FROM
issuances_extended
where
status = 'completed'
and issued_at >= date_trunc('year', current_date)
) * 1./ 100
) as issuances_ratio_ytd,
max(coalesce(IR2.issuances_ratio_m0, 0)) as issuances_ratio_m0
FROM
issuances_extended as IR1
LEFT JOIN (
SELECT
network,
count(*) / (
(
select
count(*)
FROM
issuances_extended
where
status = 'completed'
and issued_at >= date_trunc('month', current_date)
) * 1./ 100
) as issuances_ratio_m0
FROM
issuances_extended
WHERE
status = 'completed'
and (issued_at >= date_trunc('month', current_date))
GROUP BY
network
) AS IR2 ON IR1.network = IR2.network
WHERE
status = 'completed'
and (issued_at >= date_trunc('year', current_date))
GROUP BY
IR1.network,
IR2.issuances_ratio_m0
order by
IR1.network

Related

How to subtract two timestamps in SQL and then count?

I want to basically find out how many users paid within 15 mins, 30 mins and 60 mins of my payment_time and trigger_time
I have the following query
with redshift_direct() as conn:
trigger_time_1 = pd.read_sql(f"""
with new_data as
(
select
cycle_end_date
, prime_tagging_by_issuer_and_product
, u.user_id
, settled_status
, delay,
ots_created_at + interval '5:30 hours' as payment_time
,case when to_char(cycle_end_date,'DD') = '15' then 'Odd' else 'Even' end as cycle_order
from
settlement_summary_from_snapshot s
left join (select distinct user_phone_number, user_id from user_events where event_name = 'UserCreatedEvent') u
on u.user_id = s.user_id
and cycle_type = 'end_cycle'
and cycle_end_date > '2021-11-30' and cycle_end_date < '2022-01-15'
)
select
bucket_id
, cycle_end_date, d.cycle_order
, date(cycle_end_date) as t_cycle_end_date
,d.prime_tagging_by_issuer_and_product
,source
,status as cause
,split_part(campaign_name ,'|', 1) as campaign
,split_part(campaign_name ,'|', 2) as sms_cycle_end_date
,split_part(campaign_name ,'|', 3) as day
,split_part(campaign_name ,'|', 4) as type
,to_char(to_date(split_part(campaign_name ,'|', 2) , 'DD/MM/YYYY'), 'YYYY-MM-DD') as campaign_date,
d.payment_time, payload_event_timestamp + interval '5:30 hours' as trigger_time
,count( s.user_id) as count
from sms_callback_events s
inner join new_data d
on s.user_id = d.user_id
where bucket_id > 'date_2021_11_30' and bucket_id < 'date_2022_01_15'
and campaign_name like '%RC%'
and event_name = 'SmsStatusUpdatedEvent'
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14
""",conn)
How do i achieve making 3 columns with number of users who paid within 15mins, 30 mins and 60 mins after trigger_time in this query? I was doing it with Pandas but I want to find a way to do it here itself. Can someone help?
I wrote my own DATEDIFF function, which returns an integer value of differencing between two dates, difference by day, by month, by year, by hour, by minute and etc. You can use this function on your queries.
DATEDIFF Function SQL Code on GitHub
Sample Query about using our DATEDIFF function:
select
datediff('minute', mm.start_date, mm.end_date) as diff_minute
from
(
select
'2022-02-24 09:00:00.100'::timestamp as start_date,
'2022-02-24 09:15:21.359'::timestamp as end_date
) mm;
Result:
---------------
diff_minute
---------------
15
---------------

How to solve a nested aggregate function in SQL?

I'm trying to use a nested aggregate function. I know that SQL does not support it, but I really need to do something like the below query. Basically, I want to count the number of users for each day. But I want to only count the users that haven't completed an order within a 15 days window (relative to a specific day) and that have completed any order within a 30 days window (relative to a specific day). I already know that it is not possible to solve this problem using a regular subquery (it does not allow to change subquery values for each date). The "id" and the "state" attributes are related to the orders. Also, I'm using Fivetran with Snowflake.
SELECT
db.created_at::date as Date,
count(case when
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-15,Date) and dateadd(day,-1,Date)) then db.id end)
= 0) and
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-30,Date) and dateadd(day,-16,Date)) then db.id end)
> 0) then db.user end)
FROM
data_base as db
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
In other words, I want to transform the below query in a way that the "current_date" changes for each date.
WITH completed_15_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-15,current_date) and dateadd(day,-1,current_date)
group by User
),
completed_16_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-30,current_date) and dateadd(day,-16,current_date)
group by User
)
SELECT
date(db.created_at) as Date,
count(distinct case when comp_15.completadas = 0 and comp_16.completadas > 0 then comp_15.user end) as "Total Users Churn",
count(distinct case when comp_15.completadas > 0 then comp_15.user end) as "Total Users Active",
week(Date) as Week
FROM
data_base as db
left join completadas_15_days_before as comp_15 on comp_15.user = db.user
left join completadas_16_days_before as comp_16 on comp_16.user = db.user
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
Does anyone have a clue on how to solve this puzzle? Thank you very much!
The following should give you roughly what you want - difficult to test without sample data but should be a good enough starting point for you to then amend it to give you exactly what you want.
I've commented to the code to hopefully explain what each section is doing.
-- set parameter for the first date you want to generate the resultset for
set start_date = TO_DATE('2020-01-01','YYYY-MM-DD');
-- calculate the number of days between the start_date and the current date
set num_days = (Select datediff(day, $start_date , current_date()+1));
--generate a list of all the dates from the start date to the current date
-- i.e. every date that needs to appear in the resultset
WITH date_list as (
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date_item
from table (generator(rowcount => ($num_days)))
)
--Create a list of all the orders that are in scope
-- i.e. 30 days before the start_date up to the current date
-- amend WHERE clause to in/exclude records as appropriate
,order_list as (
SELECT created_at, rt_id
from data_base
where created_at between dateadd(day,-30,$start_date) and current_date()
and state = 'finished'
)
SELECT dl.date_item
,COUNT (DISTINCT ol30.RT_ID) AS USER_COUNT
,COUNT (ol30.RT_ID) as ORDER_COUNT
FROM date_list dl
-- get all orders between -30 and -16 days of each date in date_list
left outer join order_list ol30 on ol30.created_at between dateadd(day,-30,dl.date_item) and dateadd(day,-16,dl.date_item)
-- exclude records that have the same RT_ID as in the ol30 dataset but have a date between 0 amd -15 of the date in date_list
WHERE NOT EXISTS (SELECT ol15.RT_ID
FROM order_list ol15
WHERE ol30.RT_ID = ol15.RT_ID
AND ol15.created_at between dateadd(day,-15,dl.date_item) and dl.date_item)
GROUP BY dl.date_item
ORDER BY dl.date_item;

Divide results from two query by another query in SQL

I have this query in Metabase:
with l1 as (SELECT date_trunc ('day', Ticket_Escalated_At) as time_scale, count (Ticket_ID) as chat_per_day
FROM CHAT_TICKETS where SUPPORT_QUEUE = 'transfer_investigations'
and date_trunc('month', TICKET_ESCALATED_AT) > now() - interval '6' Month
GROUP by 1)
with l2 as (SELECT date_trunc('day', created_date) as week, count(*) as TI_watchman_ticket
FROM jira_issues
WHERE issue_type NOT IN ('Transfer - General', 'TI - Advanced')
and date_trunc('month', created_date) > now() - interval '6' Month
and project_key = 'TI2'
GROUP BY 1)
SELECT l1.* from l1
UNION SELECT l2.* from l2
ORDER by 1
and this one:
with hours as (SELECT date_trunc('day', ws.start_time) as date_
,(ifnull(sum((case when ws.shift_position = 'TI - Non-watchman' then (minutes_between(ws.end_time, ws.start_time)/60) end)),0) + ifnull(sum((case when ws.shift_position = 'TI - Watchman' then (minutes_between(ws.end_time, ws.start_time)/60) end)),0) ) as total
from chat_agents a
join wiw_shifts ws on a.email = ws.user_email
left join people_ops.employees h on substr(h.email,1, instr(h.email,'#revolut') - 1) = a.login
where (seniority != 'Lead' or seniority is null)
and date_trunc('month', ws.start_time) > now() - interval '6' Month
GROUP BY 1)
I would like to divide the output of the UNION of the first one, by the result of the second one, any ideas.

How do I display records in the same row although I am using group by on 2 columns that appear in different rows right now?

This is the output I am getting now but I want all the records for one gateway in one row I am trying to find the damage count and total count of packages processed by an airport in a week. Currently I am grouping by airport and week so I am getting the records in different rows for an airport and week. I want to have the records for a particular airport in a single row with weeks being in the same row.
I tried putting a conditional group by but that did not work.
select tmp.gateway,tmp.weekbucket, sum(tmp.damaged_count) as DamageCount, sum(tmp.total_count) as TotalCount, round(sum(tmp.DPMO),0) as DPMO from
(
select a.gateway,
date_trunc('week', (a.processing_date + interval '1 day')) - interval '1 day' as weekbucket,
count(distinct(b.fulfillment_shipment_id||b.package_id)) as damaged_count,
count(distinct(a.fulfillment_shipment_id||a.package_id)) as total_count,
count(distinct(b.fulfillment_shipment_id||b.package_id))*1.00/count(distinct(a.Fulfillment_Shipment_id || a.package_id))*1000000 as DPMO
from booker.d_air_shipments_na a
left join trex.d_ps_packages b
on (a.fulfillment_shipment_id||a.package_id =b.Fulfillment_Shipment_id||b.package_id)
where a.processing_date >= current_date-7
and (exception_summary in ('Reprint-Damaged Label') or exception_summary IS NULL)
and substring(route, position(a.gateway IN route) +6, 1) <> 'K'
group by a.gateway, weekbucket) as tmp
group by tmp.gateway, tmp.weekbucket
order by tmp.gateway, tmp.weekbucket desc;
As you get two week's days starting and ending hence its likely that youll get 2 rows for each. Can try to remove week bucket from group by after performing your actual select/within the inner select and put a max on week bucket with summing both counts of both start and end of week dates.
select
tmp.gateway,max(tmp.weekbucket),
sum(tmp.damaged_count) as
DamageCount,
sum(tmp.total_count) as TotalCount,
round(sum(tmp.DPMO),0) as DPMO
from
(
select a.gateway,
date_trunc('week', (a.processing_date +
interval '1 day')) - interval '1 day' as
weekbucket, count(distinct(b.fulfillment_shipment_id||b
.package_id)) as damaged_count,
count(distinct(a.fulfillment_shipment_id||a .package_id)) as total_count,
count(distinct(b.fulfillment_shipment_id||b.package_id))*1.00/count(distinct(a.Fulfillment_Shipment_id || a.package_id))*1000000 as DPMO
from booker.d_air_shipments_na a
left join trex.d_ps_packages b
on (a.fulfillment_shipment_id||a.package_id =b.Fulfillment_Shipment_id||b.package_id)
where a.processing_date >= current_date-7
and (exception_summary in ('Reprint-Damaged Label') or exception_summary IS NULL)
and substring(route, position(a.gateway IN route) +6, 1) <> 'K'
group by a.gateway, weekbucket) as tmp
group by tmp.gateway
order by tmp.gateway,
max(tmp.weekbucket) desc;
So you want to pivot the two weeks into a single row with two sets of aggregates?:
select
tmp.gateway,
tmp.weekbucket,
min(case when rn = 1 then tmp.damaged_count end) as DamageCountWeek1,
min(case when rn = 2 then tmp.damaged_count end) as DamageCountWeek2,
min(case when rn = 1 then tmp.total_count end) as TotalCountWeek1,
min(case when rn = 2 then tmp.total_count end) as TotalCountWeek2,
min(case when rn = 1 then round(tmp.DPMO, 0) end) as DPMOWeek1,
min(case when rn = 2 then round(tmp.DPMO, 0) end) as DPMOWeek2,
from (
select row_number() over (partition by gateway order by weekbucket) as rn,
...
) as tmp
group by tmp.gateway
order by tmp.gateway;

PostgreSQL "Subquery must return only one column" error

Hello i'm trying to do a cohort study. I have some trouble about a subquery error when running my query. I actually can compute the repeat percentage only, but when i add the number of new customers and the number of repeaters, the error came in. I want to have the details of this percentage (the ratio of repeaters over the number of new customers) in my final result.
Thank you very much for your help ! :)
Line 24-25-26
SELECT time_table.*,
(
WITH new_customers AS
(
SELECT DISTINCT
order_report._customer_id
FROM order_report
INNER JOIN
(
SELECT DISTINCT _customer_id
FROM order_report
WHERE order_report._created_at::timestamp BETWEEN time_table.first_order_start AND time_table.first_order_stop
AND _order_status = 'paid' AND _order_product_status != 'UNAVAILABLE'
) AS period_orders ON period_orders._customer_id = order_report._customer_id
WHERE _order_status = 'paid' AND _order_product_status != 'UNAVAILABLE'
GROUP BY order_report._customer_id
HAVING MIN(order_report._created_at::timestamp) BETWEEN time_table.first_order_start AND time_table.first_order_stop
)
SELECT
COUNT(*) as repeaters,
(SELECT COUNT(*) FROM new_customers) as new_customers,
COUNT(*)::float/(SELECT COUNT(*) FROM new_customers) as repeat_percent
FROM
(
SELECT COUNT(*), order_report._customer_id
FROM order_report
INNER JOIN new_customers
ON new_customers._customer_id = order_report._customer_id
WHERE order_report._created_at::timestamp <= time_table.stop
AND _order_status = 'paid' AND _order_product_status != 'UNAVAILABLE'
GROUP BY order_report._customer_id
HAVING COUNT(*) > 1
) AS REPEATS
)
FROM
(
WITH time_serie AS
(
SELECT
generate_series AS start,
(generate_series + interval '3 month' - interval '1 second') AS stop
FROM generate_series('2017-01-01 00:00'::timestamp, '2017-06-30', '1 month')
),
first_order_serie AS
(
SELECT
start AS first_order_start,
stop AS first_order_stop
FROM time_serie
)
SELECT * FROM time_serie, first_order_serie) AS time_table
I think you should divide queries and you inspect one by one. And then you detect which query is wrong. If you dedicate it, you share it again. I think your problem maybe this query:
-------------------
FROM
(
WITH time_serie AS
(
SELECT
generate_series AS start,
(generate_series + interval '3 month' - interval '1 second') AS stop
FROM generate_series('2017-01-01 00:00'::timestamp, '2017-06-30', '1 month')
),
first_order_serie AS
(
SELECT
start AS first_order_start,
stop AS first_order_stop
FROM time_serie
)
Your query is starting with a select, so everything after that is a subquery.
Write your query with all the CTEs first:
with new_customers as (
. . .
),
time_serie as (
),
first_order_serie as (
)
select . . .
from . . .