I'm trying to find out the bounce rate of the top 10 UTM source. There is no column for bounces in the table so I have to query it. I've created a query to find the TOP 10 UTM source and another query to find bounce rate. I just can't seem to figure out how to combine both of this queries together.
The database table contains:
1) cuuid -cookie ID
2) session - session
3) duration
4) Each row represents a page view
SELECT
TOP 10 regexp_replace(regexp_substr(url, 'utm_source\\=[^\\&]*'), 'utm_source='),
COUNT(DISTINCT(cuuid)) as "Total Unique Visitors",
COUNT(DISTINCT(session)) as "Total Unique Sessions",
COUNT(*) as "Total Page Views",
CAST(COUNT(DISTINCT(session)) AS FLOAT)/CAST(COUNT(DISTINCT(cuuid)) AS FLOAT) AS "Average Sessions per Visitor",
CAST(COUNT(*) AS FLOAT)/CAST(COUNT(DISTINCT(session)) AS FLOAT) AS "Average Pageview per Session",
ROUND(SUM(CASE WHEN duration < 0 THEN 0 ELSE duration END)::FLOAT/COUNT(DISTINCT(session))) AS "Average Duration per Session"
FROM table1
WHERE url ILIKE '%%utm_source%%'
AND ts>='2018-05-01'
AND ts < '2018-06-01'
GROUP BY 1
ORDER BY 2 DESC;
--add bounce rate query into first--
SELECT
CAST((CAST((SUM(bounces)*100) AS FLOAT)/CAST(COUNT(*) AS FLOAT)) AS VARCHAR(5)) + '%' as "Bounce rate"
FROM (
SELECT
MIN(ts) AS "time_first_viewed",
cuuid,
session,
COUNT(*) as "number_of_events",
CASE WHEN count(*) = 1 THEN 1 ELSE 0 END AS bounces
FROM table1
WHERE ts>='2018-05-01'
AND ts < '2018-06-01'
GROUP BY cuuid, session)
For the final result, I need it to be in the same table. And the columns are:
1)UTM Source
2)Unique Visitor
3)Unique Sessions
4)Page View
5)Session/Visitor
6)Pageview/session
7)Avg Duration
8)Bounce Rate
you just do like the below with comma :
hope this works
Select * from
(SELECT
TOP 10 regexp_replace(regexp_substr(url, 'utm_source\\=[^\\&]*'), 'utm_source='),
COUNT(DISTINCT(cuuid)) as "Total Unique Visitors",
COUNT(DISTINCT(session)) as "Total Unique Sessions",
COUNT(*) as "Total Page Views",
CAST(COUNT(DISTINCT(session)) AS FLOAT)/CAST(COUNT(DISTINCT(cuuid)) AS FLOAT) AS "Average Sessions per Visitor",
CAST(COUNT(*) AS FLOAT)/CAST(COUNT(DISTINCT(session)) AS FLOAT) AS "Average Pageview per Session",
ROUND(SUM(CASE WHEN duration < 0 THEN 0 ELSE duration END)::FLOAT/COUNT(DISTINCT(session))) AS "Average Duration per Session"
FROM table1
WHERE url ILIKE '%%utm_source%%'
AND ts>='2018-05-01'
AND ts < '2018-06-01'
GROUP BY 1
ORDER BY 2 DESC)table1
,
(SELECT
CAST((CAST((SUM(bounces)*100) AS FLOAT)/CAST(COUNT(*) AS FLOAT)) AS VARCHAR(5)) + '%' as "Bounce rate"
FROM (
SELECT
MIN(ts) AS "time_first_viewed",
cuuid,
session,
COUNT(*) as "number_of_events",
CASE WHEN count(*) = 1 THEN 1 ELSE 0 END AS bounces
FROM table1
WHERE ts>='2018-05-01'
AND ts < '2018-06-01'
GROUP BY cuuid, session))table2
you defines alias names to your columns and its done.
Related
I have a table in GBQ in the following format :
UserId Orders Month
XDT 23 1
XDT 0 4
FKR 3 6
GHR 23 4
... ... ...
It shows the number of orders per user and month.
I want to calculate the percentage of users who have orders, I did it as following :
SELECT
HasOrders,
ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
SELECT
*,
CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
FROM `Table` )
GROUP BY
HasOrders
ORDER BY
Parts
It gives me the following result:
HasOrders Parts
0 35
1 65
I need to calculate the percentage of users who have orders, by month, in a way that every month = 100%
Currently to do this I execute the query once per month, which is not practical :
SELECT
HasOrders,
ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
SELECT
*,
CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
FROM `Table` )
WHERE Month = 1
GROUP BY
HasOrders
ORDER BY
Parts
Is there a way execute a query once and have this result ?
HasOrders Parts Month
0 25 1
1 75 1
0 45 2
1 55 2
... ... ...
SELECT
SIGN(Orders),
ROUND(COUNT(*) * 100.000 / SUM(COUNT(*), 2) OVER (PARTITION BY Month)) AS Parts,
Month
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
Demo on Postgres:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=4cd2d1455673469c2dfc060eccea8020
You've stated that it's important for the total to be 100% so you might consider rounding down in the case of no orders and rounding up in the case of has orders for those scenarios where the percentages falls precisely on an odd multiple of 0.5%. Or perhaps rounding toward even or round smallest down would be better options:
WITH DATA AS (
SELECT SIGN(Orders) AS HasOrders, Month,
COUNT(*) * 10000.000 / SUM(COUNT(*)) OVER (PARTITION BY Month) AS PartsPercent
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
)
select HasOrders, Month, PartsPercent,
PartsPercent - TRUNCATE(PartsPercent) AS Fraction,
CASE WHEN HasOrders = 0
THEN FLOOR(PartsPercent) ELSE CEILING(PartsPercent)
END AS PartsRound0Down,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5
AND MOD(TRUNCATE(PartsPercent), 2) = 0
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsRoundTowardEven,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5 AND PartsPercent < 50
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsSmallestTowardZero
from DATA
It's usually not advisable to test floating-point values for equality and I don't know how BigQuery's float64 will work with the comparison against 0.5. One half is nevertheless representable in binary. See these in a case where the breakout is 101 vs 99. I don't have immediate access to BigQuery so be aware that Postgres's rounding behavior is different:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=c8237e272427a0d1114c3d8056a01a09
Consider below approach
select hasOrders, round(100 * parts, 2) as parts, month from (
select month,
countif(orders = 0) / count(*) `0`,
countif(orders > 0) / count(*) `1`,
from your_table
group by month
)
unpivot (parts for hasOrders in (`0`, `1`))
with output like below
I have Two Tables Trips & Users.
I am trying to write a SQL query to find the cancellation rate of requests with unbanned users (both client and driver must not be banned) each day between
"2013-10-01" and "2013-10-03". Round Cancellation Rate in a percentage output
So in my case if Total Trips is 4 and Cancellations is 2 then I want to see .50 or if 3 and 1 then .33
Here is what I was trying to do so far...
With CTE AS (
Select Request_at as [DAY],
SUM (CASE
When Status= 'cancelled_by_driver' or Status='cancelled_by_client' THEN 1
Else 0
END)
AS Cancellations,
Count(*) as TotalTrips
FROM Trips
Where Client_id NOT IN (SELECT [Users_id] FROM [Study].[dbo].[Users] Where Banned='YES' and Roll='Client') or Driver_id NOT IN (SELECT [Users_id] FROM [Study].[dbo].[Users] Where Banned='YES' and Roll='Driver')
Group By Request_at )
Select DAY, ((TotalTrips/Cancellations) *100) as CancellationRate
From CTE
But the Divide Function is not working. I am not sure how else to approach this.
This is an error I am getting.
Any help is appreciated.
Try this one:
With CTE AS (
Select Request_at as [DAY],
SUM (CASE
When Status= 'cancelled_by_driver' or Status='cancelled_by_client' THEN 1
Else 0
END)
AS Cancellations,
Count(*) as TotalTrips
FROM Trips
Where Client_id NOT IN (SELECT [Users_id] FROM [Study].[dbo].[Users] Where Banned='YES' and Roll='Client') or Driver_id NOT IN (SELECT [Users_id] FROM [Study].[dbo].[Users] Where Banned='YES' and Roll='Driver')
Group By Request_at )
Select DAY, convert(float, TotalTrips) * 100/NULLIF(Cancellations, 0) as CancellationRate
From CTE
Since we have to query users_id twice, we can put that in a CTE. No need of a second query to calculate the percent.
With Unbanned_users As (
Select [users_id] FROM [Study].[dbo].[Users]
Where Banned<>'YES'
)
Select Request_at as [DAY],
(100*Count(CASE When Status Like 'cancelled%' Then 1 End)+Count(*)/2)
/Count(*) as Cancellation_percent
From Trips t
Inner Join Unbanned_users cl ON t.client_id=cl.users_id
Inner Join Unbanned_users dr ON t.driver_id=dr.users_id
Group By Request_at
This assumes that all of the user ids in Trips, both client and driver, are defined in the Users table.
The +Count(*)/2 is to round 2 out of 3 to 67, instead of the truncated result of 66.
This is the output I am getting now but I want all the records for one gateway in one row I am trying to find the damage count and total count of packages processed by an airport in a week. Currently I am grouping by airport and week so I am getting the records in different rows for an airport and week. I want to have the records for a particular airport in a single row with weeks being in the same row.
I tried putting a conditional group by but that did not work.
select tmp.gateway,tmp.weekbucket, sum(tmp.damaged_count) as DamageCount, sum(tmp.total_count) as TotalCount, round(sum(tmp.DPMO),0) as DPMO from
(
select a.gateway,
date_trunc('week', (a.processing_date + interval '1 day')) - interval '1 day' as weekbucket,
count(distinct(b.fulfillment_shipment_id||b.package_id)) as damaged_count,
count(distinct(a.fulfillment_shipment_id||a.package_id)) as total_count,
count(distinct(b.fulfillment_shipment_id||b.package_id))*1.00/count(distinct(a.Fulfillment_Shipment_id || a.package_id))*1000000 as DPMO
from booker.d_air_shipments_na a
left join trex.d_ps_packages b
on (a.fulfillment_shipment_id||a.package_id =b.Fulfillment_Shipment_id||b.package_id)
where a.processing_date >= current_date-7
and (exception_summary in ('Reprint-Damaged Label') or exception_summary IS NULL)
and substring(route, position(a.gateway IN route) +6, 1) <> 'K'
group by a.gateway, weekbucket) as tmp
group by tmp.gateway, tmp.weekbucket
order by tmp.gateway, tmp.weekbucket desc;
As you get two week's days starting and ending hence its likely that youll get 2 rows for each. Can try to remove week bucket from group by after performing your actual select/within the inner select and put a max on week bucket with summing both counts of both start and end of week dates.
select
tmp.gateway,max(tmp.weekbucket),
sum(tmp.damaged_count) as
DamageCount,
sum(tmp.total_count) as TotalCount,
round(sum(tmp.DPMO),0) as DPMO
from
(
select a.gateway,
date_trunc('week', (a.processing_date +
interval '1 day')) - interval '1 day' as
weekbucket, count(distinct(b.fulfillment_shipment_id||b
.package_id)) as damaged_count,
count(distinct(a.fulfillment_shipment_id||a .package_id)) as total_count,
count(distinct(b.fulfillment_shipment_id||b.package_id))*1.00/count(distinct(a.Fulfillment_Shipment_id || a.package_id))*1000000 as DPMO
from booker.d_air_shipments_na a
left join trex.d_ps_packages b
on (a.fulfillment_shipment_id||a.package_id =b.Fulfillment_Shipment_id||b.package_id)
where a.processing_date >= current_date-7
and (exception_summary in ('Reprint-Damaged Label') or exception_summary IS NULL)
and substring(route, position(a.gateway IN route) +6, 1) <> 'K'
group by a.gateway, weekbucket) as tmp
group by tmp.gateway
order by tmp.gateway,
max(tmp.weekbucket) desc;
So you want to pivot the two weeks into a single row with two sets of aggregates?:
select
tmp.gateway,
tmp.weekbucket,
min(case when rn = 1 then tmp.damaged_count end) as DamageCountWeek1,
min(case when rn = 2 then tmp.damaged_count end) as DamageCountWeek2,
min(case when rn = 1 then tmp.total_count end) as TotalCountWeek1,
min(case when rn = 2 then tmp.total_count end) as TotalCountWeek2,
min(case when rn = 1 then round(tmp.DPMO, 0) end) as DPMOWeek1,
min(case when rn = 2 then round(tmp.DPMO, 0) end) as DPMOWeek2,
from (
select row_number() over (partition by gateway order by weekbucket) as rn,
...
) as tmp
group by tmp.gateway
order by tmp.gateway;
In the following, I need to add here the total of orders per order type which is IHORDT. I tried count(t01.ihordt), but its not a valid. I need this order total to get average amount per order.
Data expected:
Current:
IHORDT current year previous year
RTR 100,000 90,000
INT 2,000,000 1,500,000
New change: add to the above one column:
Total orders
RTR 100
INT 1000
SELECT T01.IHORDT
-- summarize by current year and previous year
,SUM( CASE WHEN YEAR(IHDOCD) = YEAR(CURRENT TIMESTAMP) - 1
THEN (T02.IDSHP#*T02.IDNTU$) ELSE 0 END) AS LastYear
,SUM( CASE WHEN YEAR(IHDOCD) = YEAR(CURRENT TIMESTAMP)
THEN (T02.IDSHP#*T02.IDNTU$) ELSE 0 END) AS CurYear
FROM ASTDTA.OEINHDIH
T01 INNER JOIN
ASTDTA.OEINDLID T02
ON T01.IHORD# = T02.IDORD#
WHERE T01.IHORDT in ('RTR', 'INT')
--------------------------------------------------------
AND ( YEAR(IHDOCD) = YEAR(CURRENT TIMESTAMP) - 1
OR YEAR(IHDOCD) = YEAR(CURRENT TIMESTAMP))
GROUP BY T01.IHORDT
To receive a count of records in a group you need to use count(*).
So here is a generic example:
select order_type,
sum(order_amount) as total_sales,
count(*) as number_of_orders
from order_header
group by order_type;
I have two query joined with a union All.
SELECT select 'Finished' AS Status,amount AS amount,units As Date
from table1 WHERE Pdate > cdate AND name =#name
UNION ALL
SELECT select 'Live' AS Live,amount,units
from table1 Where Pdate = cdate And name =#name
Result
Status amount units
Finished 100 20
Live 200 10
When either of the query fetches empty set I get only one row and, if both fetches empty set then I no rows
So how can I get result like this
Status amount Units
Finished 100 20
Live 0 0
OR
Status amount Units
Finished 0 0
Live 200 10
OR
Status amount Units
Finished 0 0
Live 0 0
Thanks.
I would think you can do it using sum? And if sum doesn't return 0 when there are no rows then replace with Coalesce(sum(amount), 0) as amount
SELECT select 'Finished' AS Status,sum(amount) AS amount, sum(units) As Unit
from table1 WHERE Pdate > cdate AND name =#name
UNION ALL
SELECT select 'Live' AS Status, sum(amount) as amount, sum(units) as Unit
from table1 Where Pdate = cdate And name =#name
And if you are not trying to sum the results then just coalesce should work? coalesce(amount, 0) As amount etc...
I just want to point out that your query is needlessly complex, with nested selects and a union all. A better way to write the query is:
select (case when pdate > cdate then 'Finished' else 'Live' end) AS Status,
amount AS amount, units As Date
from table1
WHERE Pdate >= cdate AND name = #name
This query does not produce what you want, since it only produces rows where there is data.
One way to get the additional rows is to augment the original data and then check if it is needed.
select status, amount, units as Date
from (select Status, amount, units,
row_number() over (partition by status order by amount desc, units desc) as seqnum
from (select (case when pdate > cdate then 'Finished' else 'Live' end) AS Status,
amount, units, name
from table1
WHERE Pdate >= cdate AND name = #name
) union all
(select 'Finished', 0, 0, #name
) union all
(select 'Live', 0, 0, #name
)
) t
where (amount > 0 or units > 0) or
(seqnum = 1)
This adds in the extra rows that you want. It then enumerates them, so they would go last in any sequence. They are ignored, unless they are the first in the sequence.
Try something like this
with stscte as
(
select 'Finished' as status
union all
select 'Live'
),
datacte
as(
select 'Finished' AS Status,amount AS amount,units As Date
from table1 WHERE Pdate > cdate AND name =#name
UNION ALL
select 'Live' ,amount,units
from table1 Where Pdate = cdate And name =#name
)
select sc.status,isnull(dc.amount,0) as amount,isnull(dc.unit,0) as unit
from stscte sc left join datacte dc
on sc.status = dc.status