Divide results from two query by another query in SQL - sql

I have this query in Metabase:
with l1 as (SELECT date_trunc ('day', Ticket_Escalated_At) as time_scale, count (Ticket_ID) as chat_per_day
FROM CHAT_TICKETS where SUPPORT_QUEUE = 'transfer_investigations'
and date_trunc('month', TICKET_ESCALATED_AT) > now() - interval '6' Month
GROUP by 1)
with l2 as (SELECT date_trunc('day', created_date) as week, count(*) as TI_watchman_ticket
FROM jira_issues
WHERE issue_type NOT IN ('Transfer - General', 'TI - Advanced')
and date_trunc('month', created_date) > now() - interval '6' Month
and project_key = 'TI2'
GROUP BY 1)
SELECT l1.* from l1
UNION SELECT l2.* from l2
ORDER by 1
and this one:
with hours as (SELECT date_trunc('day', ws.start_time) as date_
,(ifnull(sum((case when ws.shift_position = 'TI - Non-watchman' then (minutes_between(ws.end_time, ws.start_time)/60) end)),0) + ifnull(sum((case when ws.shift_position = 'TI - Watchman' then (minutes_between(ws.end_time, ws.start_time)/60) end)),0) ) as total
from chat_agents a
join wiw_shifts ws on a.email = ws.user_email
left join people_ops.employees h on substr(h.email,1, instr(h.email,'#revolut') - 1) = a.login
where (seniority != 'Lead' or seniority is null)
and date_trunc('month', ws.start_time) > now() - interval '6' Month
GROUP BY 1)
I would like to divide the output of the UNION of the first one, by the result of the second one, any ideas.

Related

PostgreSQL showing different time periods in a single query

I have a query that will return the ratio of issuances from (issuances from specific network with specific time period / total issuances). so the issuances from specific network with a specific time period divided to total issuances from all networks. Right now it returns the ratios of issuances only from last year (year-to-date I mean), I want to include several time periods in it such as one month ago, 2 month ago etc. LEFT JOIN usually works but I couldn't figure it out for this one. How do I do it?
Here is the query:
SELECT IR1.network,
count(*) / ((select count(*) FROM issuances_extended
where status = 'completed' and
issued_at >= date_trunc('year',current_date)) * 1.) as issuance_ratio_ytd
FROM issuances_extended as IR1 WHERE status = 'completed' and
(issued_at >= date_trunc('year',current_date))
GROUP BY
IR1.network
order by IR1.network
I would break your query into CTEs something like this:
with periods (period_name, period_range) as (
values
('YTD', daterange(date_trunc('year', current_date), null)),
('LY', daterange(date_trunc('year', current_date - 'interval 1 year'),
date_trunc('year', current_date))),
('MTD', daterange(date_trunc('month', current_date - 'interval 1 month'),
date_trunc('month', current_date));
-- Add whatever other intervals you want to see
), period_totals as ( -- Get period totals
select p.period_name, p.period_range, count(*) as total_issuances
from periods p
join issuances_extended i
on i.status = 'completed'
and i.issued_at <# p.period_range
)
select p.period_name, p.period_range,
i.network, count(*) as network_issuances,
1.0 * count(*) / p.total_issuances as issuance_ratio
from period_totals p
join issuances_extended i
on i.status = 'completed'
and i.issued_at <# p.period_range
group by p.period_name, p.period_range, i.network, p.total_issuances;
The problem with this is that you get rows instead of columns, but you can use a spreadsheet program or reporting tool to pivot if you need to. This method simplifies the calculations and lets you add whatever period ranges you want by adding more values to the periods CTE.
Something like this? Obviously not tested
SELECT
IR1.network,
count(*)/((select count(*) FROM issuances_extended
where status = 'completed' and
issued_at between mon.t and current_date ) * 1.) as issuance_ratio_ytd
FROM
issuances_extended as IR1 ,
(
SELECT
generate_series('2022-01-01'::date,
'2022-07-01'::date, '1 month') AS t)
AS mon
WHERE
status = 'completed' and
(issued_at between mon.t and current_date)
GROUP BY
IR1.network
ORDER BY
IR1.network
I've managed to join these tables, so I am answering my question for those who would need some help. To add more tables all you have to do is put new queries in LEFT JOIN and acknowledge them in the base query (IR3, IR4, blabla etc.)
SELECT
IR1.network,
count(*) / (
(
select
count(*)
FROM
issuances_extended
where
status = 'completed'
and issued_at >= date_trunc('year', current_date)
) * 1./ 100
) as issuances_ratio_ytd,
max(coalesce(IR2.issuances_ratio_m0, 0)) as issuances_ratio_m0
FROM
issuances_extended as IR1
LEFT JOIN (
SELECT
network,
count(*) / (
(
select
count(*)
FROM
issuances_extended
where
status = 'completed'
and issued_at >= date_trunc('month', current_date)
) * 1./ 100
) as issuances_ratio_m0
FROM
issuances_extended
WHERE
status = 'completed'
and (issued_at >= date_trunc('month', current_date))
GROUP BY
network
) AS IR2 ON IR1.network = IR2.network
WHERE
status = 'completed'
and (issued_at >= date_trunc('year', current_date))
GROUP BY
IR1.network,
IR2.issuances_ratio_m0
order by
IR1.network

How to get Postgres to return 0 for empty rows

I have a query which get data summarised between two dates like so:
SELECT date(created_at),
COUNT(COALESCE(id, 0)) AS total_orders,
SUM(COALESCE(total_price, 0)) AS total_price,
SUM(COALESCE(taxes, 0)) AS taxes,
SUM(COALESCE(shipping, 0)) AS shipping,
AVG(COALESCE(total_price, 0)) AS average_order_value,
SUM(COALESCE(total_discount, 0)) AS total_discount,
SUM(total_price - COALESCE(taxes, 0) - COALESCE(shipping, 0) - COALESCE(total_discount, 0)) as net_sales
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
order by created_at::date desc
However for dates that do not have any orders, the query returns nothing and I'd like to return 0.
I have tried with COALESCE but that doesn't seem to do the trick?
Any suggestions?
This should be substantially faster - and correct:
SELECT *
, total_price - taxes - shipping - total_discount AS net_sales -- ⑤
FROM (
SELECT created_at
, COALESCE(total_orders , 0) AS total_orders
, COALESCE(total_price , 0) AS total_price
, COALESCE(taxes , 0) AS taxes
, COALESCE(shipping , 0) AS shipping
, COALESCE(average_order_value , 0) AS average_order_value
, COALESCE(total_discount , 0) AS total_discount
FROM generate_series(timestamp '2022-07-20' -- ①
, timestamp '2022-07-26'
, interval '1 day') AS g(created_at)
LEFT JOIN ( -- ③
SELECT created_at::date
, count(*) AS total_orders -- ⑥
, sum(total_price) AS total_price
, sum(taxes) AS taxes
, sum(shipping) AS shipping
, avg(total_price) AS average_order_value
, sum(total_discount) AS total_discount
FROM orders
WHERE shop_id = 43
AND active -- simpler
AND created_at >= '2022-07-20'
AND created_at < '2022-07-27' -- ② !
GROUP BY 1
) o USING (created_at) -- ④
) sub
ORDER BY created_at DESC;
db<>fiddle here
I copied, simplified, and extended Xu's fiddle for comparison.
① Why this particular form for generate_series()? See:
Generating time series between two dates in PostgreSQL
② Assuming created_at is data type timestamp your original formulation is most probably incorrect. created_at <= '2022-07-26' would include the first instant of '2022-07-26' and exclude the rest. To include all of '2022-07-26', use created_at < '2022-07-27'. See:
How do I write a function in plpgsql that compares a date with a timestamp without time zone?
③ The LEFT JOIN is the core feature of this answer. Generate all days with generate_series(), independently aggregate days from table orders, then LEFT JOIN to retain one row per day like you requested.
④ I made the column name match created_at, so we can conveniently shorten the join syntax with the USING clause.
⑤ Compute net_sales in an outer SELECT after replacing NULL values, so we need COALESCE() only once.
⑥ count(*) is equivalent to COUNT(COALESCE(id, 0)) in any case, but cheaper. See:
Optimizing GROUP BY + COUNT DISTINCT on unnested jsonb column
PostgreSQL: running count of rows for a query 'by minute'
Please refer to the below script.
SELECT *
FROM
(SELECT date(created_at) AS created_at,
COUNT(id) AS total_orders,
SUM(total_price) AS total_price,
SUM(taxes) AS taxes,
SUM(shipping) AS shipping,
AVG(total_price) AS average_order_value,
SUM(total_discount) AS total_discount,
SUM(total_price - taxes - shipping - total_discount) AS net_sales
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
UNION
SELECT dates AS created_at,
0 AS total_orders,
0 AS total_price,
0 AS taxes,
0 AS shipping,
0 AS average_order_value,
0 AS total_discount,
0 AS net_sales
FROM generate_series('2022-07-20', '2022-07-26', interval '1 day') AS dates
WHERE dates NOT IN
(SELECT created_at
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26' ) ) a
ORDER BY created_at::date desc;
There is one sample for your reference.
Sample
I got your duplicate test cases at my side. The root cause is created_at field (datattype:timestamp), hence there are duplicate lines.
Below script is correct for your request.
SELECT *
FROM
(SELECT date(created_at) AS created_at,
COUNT(id) AS total_orders,
SUM(total_price) AS total_price,
SUM(taxes) AS taxes,
SUM(shipping) AS shipping,
AVG(total_price) AS average_order_value,
SUM(total_discount) AS total_discount,
SUM(total_price - taxes - shipping - total_discount) AS net_sales
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
UNION
SELECT dates AS created_at,
0 AS total_orders,
0 AS total_price,
0 AS taxes,
0 AS shipping,
0 AS average_order_value,
0 AS total_discount,
0 AS net_sales
FROM generate_series('2022-07-20', '2022-07-26', interval '1 day') AS dates
WHERE dates NOT IN
(SELECT date (created_at)
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26' ) ) a
ORDER BY created_at::date desc;
Here is a sample that's same with your side. Link
You can use WITH RECURSIVE to build a table of dates and then select dates that are not in your table
WITH RECURSIVE t(d) AS (
(SELECT '2015-01-01'::date)
UNION ALL
(SELECT d + 1 FROM t WHERE d + 1 <= '2015-01-10')
) SELECT d FROM t WHERE d NOT IN (SELECT d_date FROM tbl);
[look on this post : ][1]
[1]: https://stackoverflow.com/questions/28583379/find-missing-dates-postgresql#:~:text=You%20can%20use%20WITH%20RECURSIVE,SELECT%20d_date%20FROM%20tbl)%3B

Count of consecutive days ORACLE SQL

I need help with a query where a need to count de consecutive days like this
select
a.numcad, a.datapu , f.datapu , nvl(to_char(f.datapu, 'DD'),0)dia,
row_number() over (partition by a.numcad, f.datapu order by f.datapu)particao
from
ronda.r066apu a
left join (select t.numcad, t.numemp, t.datacc, t.datapu
from ronda.r070acc t
where t.datacc >= '21/01/2022'
and t.datacc <= trunc(sysdate)
group by t.numcad, t.numemp, t.datacc, t.datapu)f
on a.numemp = f.numemp
and a.numcad = f.numcad
and a.datapu = f.datapu
where a.numcad = 2675
and A.DATAPU >= '21/01/2022'
and A.DATAPU <= trunc(sysdate)
group by a.numcad, a.datapu, f.datapu, f.datacc
order by a.datapu
result is
between 24/01/2022 and 04/02/2022
is 12 days i need know this count , but i will ways get the '21/mes/year'
You can try:
SELECT TO_DATE('2022-01-24', 'YYYY-MM-DD') -
TO_DATE('2022-02-04', 'YYYY-MM-DD')
FROM dual
This returns 21, for example...

How to subtract two timestamps in SQL and then count?

I want to basically find out how many users paid within 15 mins, 30 mins and 60 mins of my payment_time and trigger_time
I have the following query
with redshift_direct() as conn:
trigger_time_1 = pd.read_sql(f"""
with new_data as
(
select
cycle_end_date
, prime_tagging_by_issuer_and_product
, u.user_id
, settled_status
, delay,
ots_created_at + interval '5:30 hours' as payment_time
,case when to_char(cycle_end_date,'DD') = '15' then 'Odd' else 'Even' end as cycle_order
from
settlement_summary_from_snapshot s
left join (select distinct user_phone_number, user_id from user_events where event_name = 'UserCreatedEvent') u
on u.user_id = s.user_id
and cycle_type = 'end_cycle'
and cycle_end_date > '2021-11-30' and cycle_end_date < '2022-01-15'
)
select
bucket_id
, cycle_end_date, d.cycle_order
, date(cycle_end_date) as t_cycle_end_date
,d.prime_tagging_by_issuer_and_product
,source
,status as cause
,split_part(campaign_name ,'|', 1) as campaign
,split_part(campaign_name ,'|', 2) as sms_cycle_end_date
,split_part(campaign_name ,'|', 3) as day
,split_part(campaign_name ,'|', 4) as type
,to_char(to_date(split_part(campaign_name ,'|', 2) , 'DD/MM/YYYY'), 'YYYY-MM-DD') as campaign_date,
d.payment_time, payload_event_timestamp + interval '5:30 hours' as trigger_time
,count( s.user_id) as count
from sms_callback_events s
inner join new_data d
on s.user_id = d.user_id
where bucket_id > 'date_2021_11_30' and bucket_id < 'date_2022_01_15'
and campaign_name like '%RC%'
and event_name = 'SmsStatusUpdatedEvent'
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14
""",conn)
How do i achieve making 3 columns with number of users who paid within 15mins, 30 mins and 60 mins after trigger_time in this query? I was doing it with Pandas but I want to find a way to do it here itself. Can someone help?
I wrote my own DATEDIFF function, which returns an integer value of differencing between two dates, difference by day, by month, by year, by hour, by minute and etc. You can use this function on your queries.
DATEDIFF Function SQL Code on GitHub
Sample Query about using our DATEDIFF function:
select
datediff('minute', mm.start_date, mm.end_date) as diff_minute
from
(
select
'2022-02-24 09:00:00.100'::timestamp as start_date,
'2022-02-24 09:15:21.359'::timestamp as end_date
) mm;
Result:
---------------
diff_minute
---------------
15
---------------

How do I display records in the same row although I am using group by on 2 columns that appear in different rows right now?

This is the output I am getting now but I want all the records for one gateway in one row I am trying to find the damage count and total count of packages processed by an airport in a week. Currently I am grouping by airport and week so I am getting the records in different rows for an airport and week. I want to have the records for a particular airport in a single row with weeks being in the same row.
I tried putting a conditional group by but that did not work.
select tmp.gateway,tmp.weekbucket, sum(tmp.damaged_count) as DamageCount, sum(tmp.total_count) as TotalCount, round(sum(tmp.DPMO),0) as DPMO from
(
select a.gateway,
date_trunc('week', (a.processing_date + interval '1 day')) - interval '1 day' as weekbucket,
count(distinct(b.fulfillment_shipment_id||b.package_id)) as damaged_count,
count(distinct(a.fulfillment_shipment_id||a.package_id)) as total_count,
count(distinct(b.fulfillment_shipment_id||b.package_id))*1.00/count(distinct(a.Fulfillment_Shipment_id || a.package_id))*1000000 as DPMO
from booker.d_air_shipments_na a
left join trex.d_ps_packages b
on (a.fulfillment_shipment_id||a.package_id =b.Fulfillment_Shipment_id||b.package_id)
where a.processing_date >= current_date-7
and (exception_summary in ('Reprint-Damaged Label') or exception_summary IS NULL)
and substring(route, position(a.gateway IN route) +6, 1) <> 'K'
group by a.gateway, weekbucket) as tmp
group by tmp.gateway, tmp.weekbucket
order by tmp.gateway, tmp.weekbucket desc;
As you get two week's days starting and ending hence its likely that youll get 2 rows for each. Can try to remove week bucket from group by after performing your actual select/within the inner select and put a max on week bucket with summing both counts of both start and end of week dates.
select
tmp.gateway,max(tmp.weekbucket),
sum(tmp.damaged_count) as
DamageCount,
sum(tmp.total_count) as TotalCount,
round(sum(tmp.DPMO),0) as DPMO
from
(
select a.gateway,
date_trunc('week', (a.processing_date +
interval '1 day')) - interval '1 day' as
weekbucket, count(distinct(b.fulfillment_shipment_id||b
.package_id)) as damaged_count,
count(distinct(a.fulfillment_shipment_id||a .package_id)) as total_count,
count(distinct(b.fulfillment_shipment_id||b.package_id))*1.00/count(distinct(a.Fulfillment_Shipment_id || a.package_id))*1000000 as DPMO
from booker.d_air_shipments_na a
left join trex.d_ps_packages b
on (a.fulfillment_shipment_id||a.package_id =b.Fulfillment_Shipment_id||b.package_id)
where a.processing_date >= current_date-7
and (exception_summary in ('Reprint-Damaged Label') or exception_summary IS NULL)
and substring(route, position(a.gateway IN route) +6, 1) <> 'K'
group by a.gateway, weekbucket) as tmp
group by tmp.gateway
order by tmp.gateway,
max(tmp.weekbucket) desc;
So you want to pivot the two weeks into a single row with two sets of aggregates?:
select
tmp.gateway,
tmp.weekbucket,
min(case when rn = 1 then tmp.damaged_count end) as DamageCountWeek1,
min(case when rn = 2 then tmp.damaged_count end) as DamageCountWeek2,
min(case when rn = 1 then tmp.total_count end) as TotalCountWeek1,
min(case when rn = 2 then tmp.total_count end) as TotalCountWeek2,
min(case when rn = 1 then round(tmp.DPMO, 0) end) as DPMOWeek1,
min(case when rn = 2 then round(tmp.DPMO, 0) end) as DPMOWeek2,
from (
select row_number() over (partition by gateway order by weekbucket) as rn,
...
) as tmp
group by tmp.gateway
order by tmp.gateway;