query tuning for performance - sql

I need to rewrite the below query for performance optimization.It is currently consuming > 50000 CPU seconds. I see the problem with group by cube. Can anyone suggest how to rewrite it
select
"platform", "subscriptions","rptg_dt","store_front_id","engaged_subscriptions",
"state_type","subscribers","qualified_subscribers","hardware_detail",
"engaged_subscribers","adam_id"
from (
select
'2020-03-16' as rptg_dt,
coalesce(abc.adam_id, 'ALL_ITEMS') as adam_id,
trim(coalesce(abc.store_front_id, 'ALL_ITEMS')) as store_front_id,
coalesce(abc.state_type, 'ALL_ITEMS') as state_type,
trim(coalesce(abc.hardware_detail, 'ALL_ITEMS')) as hardware_detail,
coalesce(abc.platform, 'ALL_ITEMS') as platform,
abc.subscriptions as subscriptions,
abc.subscribers as subscribers,
abc.qualified_subscribers as qualified_subscribers,
abc.engaged_subscribers as engaged_subscribers,
abc.engaged_subscriptions as engaged_subscriptions
from (
select store_front_id as store_front_id,
cast(cast(a.adam_id as integer) as varchar(12)) as adam_id,
case when subscn_type = 'Harmony' then 'Harmony Trial' else trim(state_type) end as state_type,
coalesce(hardware_detail, 'Unknown') as hardware_detail, --state_type,hardware_detail,platform
trim(platform_name) as platform,
count(distinct subscn_id) as subscriptions,
count(distinct acct_id) as subscribers,
count(distinct case when qualified_ind = 1 or subscn_owner_ind = 1 then acct_id end) as qualified_subscribers,
count(distinct case when engaged_ind = 1 then acct_id end) as engaged_subscribers,
count(distinct case when engaged_ind = 1 then subscn_id end) as engaged_subscriptions
from itsp_amr.atv_state_daily a
where calendar_type = 'F'
and adam_id in (1472441559,1478184786)
and a.rptg_dt = '2020-03-16'
and state_type <>'CHURNED'
group by cube /*(store_front_id,adam_id,state_type,hardware_detail,platform)*/ (1,2,3,4,5)
) abc
) mandela_temp

Related

Advanced SQL with window function

I have Table a(Dimension table) and Table B(Fact table) stores transaction shopper history.
Table a : shopped id(surrogate key) created for unique combination(any of column 2,colum3,column4 repeated it will have same shopper id)
Table b is transaction data.
I am trying to identify New customers and repeated customers for each week, expected output is below.
I am thinking following SQL Statement
Select COUNT(*) OVER (PARTITION BY shopperid,weekdate) as total_new_shopperid for Repeated customer,
for Identifying new customer(ie unique) in same join condition, I am stuck on window function..
thanks,
Sam
You can use the DENSE_RANK analytical function along with aggregate function as follows:
SELECT WEEK_DATE,
COUNT(DISTINCT CASE WHEN DR = 1 THEN SHOPPER_ID END) AS TOTAL_NEW_CUSTOMER,
SUM(CASE WHEN DR = 1 THEN AMOUNT END) AS TOTAL_NEW_CUSTOMER_AMT,
COUNT(DISTINCT CASE WHEN DR > 1 THEN SHOPPER_ID END) AS TOTAL_REPEATED_CUSTOMER,
SUM(CASE WHEN DR > 1 THEN AMOUNT END) AS TOTAL_REPEATED_CUSTOMER_AMT
FROM
(
select T.*,
DENSE_RANK() OVER (PARTITION BY SHOPPER_ID ORDER BY WEEK_DATE) AS DR
FROM YOUR_TABLE T);
GROUP BY WEEK_DATE;
Cheers!!
Tejash's answer is fine (and I'm upvoting it).
However, Oracle is quite efficient with aggregation, so two levels of aggregation might have better performance (depending on the data):
select week_date,
sum(case when min_week_date = week_date then 1 else 0 end) as new_shoppers,
sum(case when min_week_date = week_date then amount else 0 end) as new_shopper_amount,
sum(case when min_week_date > week_date then 1 else 0 end) as returning_shoppers,
sum(case when min_week_date > week_date then amount else 0 end) as returning_amount
from (select shopper_id, week_date,
sum(amount) as amount,
min(week_date) over (partition by shopper_id) as min_week_date
from t
group by shopper_id, week_date
) sw
group by week_date
order by week_date;
Note: If this has better performance, it is probably due to the elimination of count(distinct).

BigQuery: group counters by month after self-join

I have table that looks like this:
I'm trying to build a query, that will show specific partnerId counters groupped by keywordName and month.
To solve first part(without grouping by month), I've built this query:
SELECT keywordName, COUNT(keywordName) as total, IFNULL(b.ebay_count, 0) as ebay, IFNULL(c.amazon_count, 0) as amazon,
FROM LogFilesv2_Dataset.FR_Clickstats_v2 a
LEFT JOIN
(SELECT keywordName as kw , SUM(CASE WHEN partnerId='eBay' THEN 1 ELSE 0 END) as ebay_count
FROM LogFilesv2_Dataset.FR_Clickstats_v2
WHERE partnerId = 'eBay' GROUP BY kw) b
ON keywordName = b.kw
LEFT JOIN
(SELECT keywordName as kw , SUM(CASE WHEN partnerId='AmazonApi' THEN 1 ELSE 0 END) as amazon_count
FROM LogFilesv2_Dataset.FR_Clickstats_v2
WHERE partnerId = 'AmazonApi' GROUP BY kw) c
ON keywordName = c.kw
WHERE keywordName = 'flipper' -- just to filter out single kw.
GROUP BY keywordName, ebay, amazon
It works quite well and returns following output:
Now I'm trying to make additional group by month, but all my attempts returned incorrect results.
Final output supposed to be similar to this:
You can do this with conditional aggregation:
select
date_trunc(dt, month) dt,
keywordName,
count(*) total,
sum(case when partnerId = 'eBay' then 1 else 0 end) ebay,
sum(case when partnerId = 'AmazonApi' then 1 else 0 end) amazon
from LogFilesv2_Dataset.FR_Clickstats_v2
group by date_trun(dt, month), keywordName

Hive rolling sum of data over date

I am working on Hive and am facing an issue with rolling counts. The sample data I am working on is as shown below:
and the output I am expecting is as shown below:
I tried using the following query but it is not returning the rolling count:
select event_dt,status, count(distinct account) from
(select *, row_number() over (partition by account order by event_dt
desc)
as rnum from table.A
where event_dt between '2018-05-02' and '2018-05-04') x where rnum =1
group by event_dt, status;
Please help me with this if some one has solved a similar issue.
You seem to just want conditional aggregation:
select event_dt,
sum(case when status = 'Registered' then 1 else 0 end) as registered,
sum(case when status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when status = 'suspended' then 1 else 0 end) as suspended,
sum(case when status = 'reactive' then 1 else 0 end) as reactive
from table.A
group by event_dt
order by event_dt;
EDIT:
This is a tricky problem. The solution I've come up with does a cross-product of dates and users and then calculates the most recent status as of each date.
So:
select a.event_dt,
sum(case when aa.status = 'Registered' then 1 else 0 end) as registered,
sum(case when aa.status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when aa.status = 'suspended' then 1 else 0 end) as suspended,
sum(case when aa.status = 'reactive' then 1 else 0 end) as reactive
from (select d.event_dt, ac.account, a.status,
max(case when a.status is not null then a.timestamp end) over (partition by ac.account order by d.event_dt) as last_status_timestamp
from (select distinct event_dt from table.A) d cross join
(select distinct account from table.A) ac left join
(select a.*,
row_number() over (partition by account, event_dt order by timestamp desc) as seqnum
from table.A a
) a
on a.event_dt = d.event_dt and
a.account = ac.account and
a.seqnum = 1 -- get the last one on the date
) a left join
table.A aa
on aa.timestamp = a.last_status_timestamp and
aa.account = a.account
group by d.event_dt
order by d.event_dt;
What this is doing is creating a derived table with rows for all accounts and dates. This has the status on certain days, but not all days.
The cumulative max for last_status_timestamp calculates the most recent timestamp that has a valid status. This is then joined back to the table to get the status on that date. Voila! This is the status used for the conditional aggregation.
The cumulative max and join is a work-around because Hive does not (yet?) support the ignore nulls option in lag().

Firebird SQL: query slow due to coalesce or can it be rewritten

i'm having some performance problems with a frequently used query.
SELECT
v.id,
coalesce((SELECT sum(amount) FROM artjournal WHERE variant_ref=v.id AND storage_ref=1 AND atype_ref in (1,3,4)), 0) "fv",
coalesce((SELECT sum(amount) FROM artjournal WHERE variant_ref=v.id AND storage_ref=1 AND atype_ref=2), 0) "ivo",
coalesce((SELECT sum(amount) FROM artjournal WHERE variant_ref=v.id AND storage_ref=1 AND atype_ref=5), 0) "iio",
coalesce((SELECT sum(amount * mvalue) FROM artjournal WHERE variant_ref=v.id AND storage_ref=1), 0) "vw"
FROM productvariant v
since artjournal is a big table and gets thousands of new records each day the performance is getting terrible.
I have indices on all ID fields.
Is there a way to rewrite this statement to speed things up? Or can i use a different way to retrieve the data from the artjournal table and return 0 if result is null?
Thanks for your thoughts,
Christiaan
Looks like you want a filtered aggregate:
SELECT v.id,
sum(case when a.atype_ref in (1,3,4) then a.amount else 0 end) as "fv",
sum(case when a.atype_ref = 2 then a.amount else 0 end) as "ivo",
sum(case when a.atype_ref = 5 then a.amount else 0 end) as "iio",
sum(a.amount * a.mvalue) as "vw"
FROM productvariant v
LEFT JOIN artjournal a ON a.variant_ref = v.id
WHERE storage_ref = 1
GROUP BY v.id;

"Timeout expired" error, when executing view in SQL Server 2008

I've written a query in SQL Server 2008. The query takes about 4 minutes to execute.
I need this query as a View. So, I've created a view with this query and when I try to execute the view creation script, it shows the following error:
Timeout Expired.
The timeout period elapsed prior to completion of the operation or the server is not responding.
The query is:
SELECT t.jrnno,
(SELECT SUM(t1.amount)
FROM dbo.T_sh AS t1
WHERE (t1.b_or_s = '1') AND (t1.jrnno = t.jrnno)) AS buy,
(SELECT SUM(t2.amount)
FROM dbo.T_sh AS t2
WHERE (t2.b_or_s = '2') AND (t2.jrnno = t.jrnno)) AS sale,
SUM(t.amount) AS Total,
SUM(t.h_crg) AS Howla,
SUM(t.l_crg) AS Laga,
SUM(t.taxamt) AS Tax,
SUM(t.commsn) AS Commission
FROM dbo.T_sh AS t
WHERE (t.tran_type = 'S')
AND (t.jrnno NOT IN (SELECT DISTINCT jrnno
FROM dbo.T_ledger))
GROUP BY t.jrnno
T_sh and T_ledger both tables have about 100K rows. What could be the possible reason and how can I overcome this?
Update:
select
t.jrnno,
SUM(CASE WHEN t.b_or_s = 1 THEN t.amount ELSE NULL END) buy,
SUM(CASE WHEN t.b_or_s = 2 THEN t.amount ELSE NULL END) sale,
SUM(t.amount) AS Total,
SUM(t.h_crg) AS Howla,
SUM(t.l_crg) AS Laga,
SUM(t.taxamt) AS Tax,
SUM(t.commsn) AS Commission
FROM
dbo.t_sh t
WHERE
t.tran_type = 'S'
AND NOT EXISTS(SELECT 1 FROM dbo.T_ledger x where x.jrnno = t.jrnno)
group by
t.jrnno
It solved my problem. Thanks everyone for your quick response.
Try this query:
select
t.jrno,
SUM(CASE WHEN t1.b_or_s = 1 THEN t.amount ELSE NULL END) buy,
SUM(CASE WHEN t1.b_or_s = 2 THEN t.amount ELSE NULL END) sale,
SUM(t.amount) AS Total,
SUM(t.h_crg) AS Howla,
SUM(t.l_crg) AS Laga,
SUM(t.taxamt) AS Tax,
SUM(t.commsn) AS Commission
FROM dbo.t_sh t
WHERE t.tran_type = 'S'
AND NOT EXISTS(SELECT 1 FROM dbo.T_ledger x x.jrno = t.jrno)
Your query only needs to scan dbo.T_sh once:
SELECT t.jrnno,
SUM(CASE WHEN t.b_or_s = 1 THEN t.amount ELSE NULL END) AS buy,
SUM(CASE WHEN t.b_or_s = 2 THEN t.amount ELSE NULL END) AS sale,
SUM(t.amount) AS Total,
SUM(t.h_crg) AS Howla,
SUM(t.l_crg) AS Laga,
SUM(t.taxamt) AS Tax,
SUM(t.commsn) AS Commission
FROM dbo.T_sh AS t
WHERE t.tran_type = 'S'
AND t.jrnno NOT IN (SELECT DISTINCT
tl.jrnno
FROM dbo.T_ledger tl)
GROUP BY t.jrnno