How can I convert 5 cube group by in Teradata to BQ? - google-bigquery

Trying to convert the following Teradata code to BQ, since BQ doesnt support cube group by how do we rewrite this to ensure the cube functionality remains in the output ? There is 8 expressions so i would suspect we will need to enter multiple union statements to get it to convert, but that is not efficient, is there any other way ?
sel
cal_dt
,month_id
,f_dt
,coalesce(Product,'_Overall') Product
,coalesce(Partner,'_Overall') Partner
,coalesce(cntry,'_Overall') cntry
,coalesce(Channel,'_Overall') Channel
,coalesce(Model,'_Overall') Model
,sum(prod_vol ) Prod
,sum(prod_vol_EUR ) prod_vol_EUR
,sum(prod_vol_AUD ) prod_vol_AUD
,sum(prod_vol_CAD ) prod_vol_CAD
,sum(prod_vol_GBP ) prod_vol_GBP
,sum(Revenue ) Revenue
,sum(Revenue_EUR ) Revenue_EUR
,sum(Revenue_AUD ) Revenue_AUD
,sum(Revenue_CAD ) Revenue_CAD
,sum(Revenue_GBP ) Revenue_GBP
,sum(Txns) Txns
,sum( actv_sndrs) actv_sndrs
,sum( actv_rcvrs) actv_rcvrs
,sum(actvns) actvns
from (sel
a.cal_dt
,a.month_id
,f_dt
,cast(Product as varchar(40)) Product
,cast(Partner as varchar(40)) Partner
,cast(cntry as varchar(40)) cntry
,cast(Channel as varchar(40)) Channel
,cast(Model as varchar(40)) Model
,sum(prod_vol ) prod_vol
,sum(prod_vol_EUR ) prod_vol_EUR
,sum(prod_vol_AUD ) prod_vol_AUD
,sum(prod_vol_CAD ) prod_vol_CAD
,sum(prod_vol_GBP ) prod_vol_GBP
,sum(Revenue ) Revenue
,sum(Revenue_EUR ) Revenue_EUR
,sum(Revenue_AUD ) Revenue_AUD
,sum(Revenue_CAD ) Revenue_CAD
,sum(Revenue_GBP ) Revenue_GBP
,sum(Txns ) Txns
,count(distinct case when Txns>0 then sndr_id end) actv_sndrs
,count(distinct case when Txns>0 then rcvr_id end) actv_rcvrs
,sum(actvn) actvns
from base.table a
join
(sel month_id, min(calendar_dt) f_dt from calender.table group by 1) cal on a.month_id=cal.month_id
group by 1,2,3,cube(4,5,6,7,8)
) x
group by 1,2,3,4,5,6,7,8
im Trying to rewrite this in an efficient way with BQ

Related

SQL - find row with closest date but different column value

i'm new to SQL and i would need an help.
I have a TAB and I need to find for any item B in the TAB the item A with the closest date. In this case the A with 02.09.2021 04:25:30
Date.
Item
07.09.2021 05:02:05
A
06.09.2021 05:01:02
A
05.09.2021 05:00:02
A
04.09.2021 04:59:01
A
03.09.2021 04:58:03
A
02.09.2021 04:56:55
A
02.09.2021 04:33:56
B
02.09.2021 04:25:30
A
WITH CTE(DATE,ITEM)AS
(
SELECT '20210907 05:02:05' , 'A'UNION ALL
SELECT '20210906 05:01:02' , 'A'UNION ALL
SELECT '20210905 05:00:02' , 'A'UNION ALL
SELECT'20210904 04:59:01' , 'A'UNION ALL
SELECT'20210903 04:58:03' , 'A'UNION ALL
SELECT'20210902 04:56:55' , 'A'UNION ALL
SELECT'20210902 04:33:56' , 'B'UNION ALL
SELECT'20210902 04:25:30' , 'A'
)
SELECT
CAST(C.DATE AS DATETIME)X_DATE,C.ITEM,Q.CLOSEST
FROM CTE AS C
OUTER APPLY
(
SELECT TOP 1 CAST(X.DATE AS DATETIME)CLOSEST
FROM CTE AS X
WHERE X.ITEM='A'AND CAST(X.DATE AS DATETIME)<CAST(C.DATE AS DATETIME)
ORDER BY CAST(X.DATE AS DATETIME) ASC
)Q
WHERE C.ITEM='B'
You can use OUTER APPLY-approach as in the above query.
Please also take a look that datetime-column (DATE)is written in the ISO-compliant form
Your data has only two columns. If you want the only the closest A timestamp, then the fastest way is probably window functions:
select t.*,
(case when prev_a_date is null then next_a_date
when next_a_date is null then prev_a_date
when datediff(second, prev_a_date, date) <= datediff(second, date, next_a_date) then prev_a_date
else next_a_date
end) as a_date
from (select t.*,
max(case when item = 'A' then date end) over (order by date) as prev_a_date,
min(case when item = 'A' then date end) over (order by date desc) as next_a_date
from t
) t
where item = 'B';
This uses seconds to measure the time difference, but you can use a smaller unit if appropriate.
You can also do this using apply if you have more columns from the "A" rows that you want:
select tb.*, ta.*
from t b outer apply
(select top (1) ta.*
from t ta
where item = 'A'
order by abs(datediff(second, a.date, b.date))
) t
where item = 'B';

Convert CTE Query into normal Query

I want to convert my #PostgreSQL, CTE Query, into Normal Query because the cte function is mainly used in data warehouse SQL and not efficient for Postgres production DBS.
So, need help in converting this CTE query into a normal Query
WITH
cohort AS (
SELECT
*
FROM (
select
activity_id,
ts,
customer,
activity,
case
when activity = 'completed_order' and lag(activity) over (partition by customer order by ts) != 'email'
then null
when activity = 'email' and lag(activity) over (partition by customer order by ts) !='email'
then 1
else 0
end as cndn
from activity_stream where customer in (select customer from activity_stream where activity='email')
order by ts
) AS s
)
(
select
*
from cohort as s
where cndn = 1 OR cndn is null order by ts)
You may just inline the CTE into your outer query:
select *
from
(
select activity_id, ts, customer, activity,
case when activity = 'completed_order' and lag(activity) over (partition by customer order by ts) != 'email'
then null
when activity = 'email' and lag(activity) over (partition by customer order by ts) !='email'
then 1
else 0
end as cndn
from activity_stream
where customer in (select customer from activity_stream where activity = 'email')
) as s
where cndn = 1 OR cndn is null
order by ts;
Note that you have an unnecessary subquery in the CTE, which does an ORDER BY which won't "stick" anyway. But other than this, you might want to keep your current code as is.

Display two count from a single in two different column from a single table

I have a table where I record daily work of employees. I have a query where I display the current work for today for each employee and have another query where I display the total count of work for each employee.
I want to combine the 2 queries into a single one where I have a daily column and a cumulative column.
my query is below:
SELECT staff,
process_inprogress,
not_yet_completed
FROM (SELECT staff,
Count(number) AS Process_InProgress,
Count(team_name) AS Not_Yet_Completed
FROM dbo.empty_shell_workflow
WHERE ( end_date IS NULL )
AND ( process_name IS NOT NULL )
AND ( billing_amount IS NULL )
AND ( deletion IS NULL )
AND ( team_name = 'Team Vishma' )
AND ( CONVERT(DATE, start_date) = CONVERT(DATE, Getdate()) )
GROUP BY staff
UNION ALL
SELECT staff,
Count(number) AS Process_InProgress,
Count(team_name) AS Not_Yet_Completed
FROM dbo.empty_shell_workflow AS Empty_Shell_Workflow_1
WHERE ( team_name = 'Team Vishma' )
AND ( billing_amount IS NULL )
AND ( tag_number IS NULL )
AND ( initiator IS NOT NULL )
AND ( end_date IS NULL )
AND ( deletion IS NULL )
AND ( process_name IS NOT NULL )
GROUP BY staff) AS t
however it is being display only in a single column for both daily and cumulative
Below is how i want it to display
Staff Process_Progress(Daily) Not_YetCompleted(Cumulative)
A 2 5
B 0 1
C 6 8
however from the query above, the cumulative is being display in the daily column
Any idea, how can I modify the query?
you could try like below by using case when
with cte as
( SELECT staff,CONVERT(DATE, start_date) as date_of_month
Count(number) AS Process_InProgress
FROM dbo.empty_shell_workflow AS Empty_Shell_Workflow_1
WHERE ( team_name = 'Team Vishma' )
AND ( billing_amount IS NULL )
AND ( tag_number IS NULL )
AND ( initiator IS NOT NULL )
AND ( end_date IS NULL )
AND ( deletion IS NULL )
AND ( process_name IS NOT NULL )
GROUP BY staff,CONVERT(DATE, start_date)
) select staff, sum(case when date_of_month = CONVERT(DATE, Getdate()) then
Process_InProgress else 0 end) as Process_Progress_Daily,
sum(case when date_of_month != CONVERT(DATE, Getdate()) then
Process_InProgress else 0 end) as Not_YetCompleted
from cte
group by staff

Group by in columns and rows, counts and percentages per day

I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.

TSQL Row_Number

This question has been covered similarly before BUT I'm struggling.
I need to find top N sales based on customer buying patterns..
ideally this needs to be top N by customer by Month Period by Year but for now i'm just looking at top N over the whole DB.
My query looks like:
-- QUERY TO SHOW TOP 2 CUSTOMER INVOICES BY CUSTOMER BY MONTH
SELECT
bill_to_code,
INVOICE_NUMBER,
SUM( INVOICE_AMOUNT_CORP ) AS 'SALES',
ROW_NUMBER() OVER ( PARTITION BY bill_to_code ORDER BY SUM( INVOICE_AMOUNT_CORP ) DESC ) AS 'Row'
FROM
FACT_OM_INVOICE
JOIN dim_customer_bill_to ON FACT_OM_INVOICE.dim_customer_bill_to_key = dim_customer_bill_to.dim_customer_bill_to_key
--WHERE
-- 'ROW' < 2
GROUP BY
invoice_number,
Dim_customer_bill_to.bill_to_code
I can't understand the solutions given to restrict Row to =< N.
Please help.
Try this.
-- QUERY TO SHOW TOP 2 CUSTOMER INVOICES BY CUSTOMER BY MONTH
;WITH Top2Customers
AS
(
SELECT
bill_to_code,
INVOICE_NUMBER,
SUM( INVOICE_AMOUNT_CORP ) AS 'SALES',
ROW_NUMBER() OVER ( PARTITION BY bill_to_code ORDER BY SUM( INVOICE_AMOUNT_CORP ) DESC )
AS 'RowNumber'
FROM
FACT_OM_INVOICE
JOIN dim_customer_bill_to ON FACT_OM_INVOICE.dim_customer_bill_to_key = dim_customer_bill_to.dim_customer_bill_to_key
GROUP BY
invoice_number,
Dim_customer_bill_to.bill_to_code
)
SELECT * FROM Top2Customers WHERE RowNumber < 3
You have to wrap your select into another to use the value produced by row_number()
select * from (
SELECT
bill_to_code,
INVOICE_NUMBER,
SUM( INVOICE_AMOUNT_CORP ) AS SALES,
ROW_NUMBER() OVER ( PARTITION BY bill_to_code ORDER BY SUM( INVOICE_AMOUNT_CORP ) DESC ) AS RowNo
FROM
FACT_OM_INVOICE
JOIN dim_customer_bill_to ON FACT_OM_INVOICE.dim_customer_bill_to_key = dim_customer_bill_to.dim_customer_bill_to_key
--WHERE
-- 'ROW' < 2
GROUP BY
invoice_number,
Dim_customer_bill_to.bill_to_code
) base where RowNo < 2