Improving Performance without Joins in Oracle - sql

I have 3 tables.
- Tab_active(Contains active details )(Contains 30 million records)
- Tab_apr(Contains customer details for april) (contains 7 lakh
records)
- Tab_may(Contains customer details for may) (contains 7 lakh
records)
- Tab_jun(Contains customer details for june) (contains 7 lakh
records)
The table structure is the same for all the tables
CustNo Revenue
1000 54.55
Now I write a procedure to calculate
[(Revenue for June)/(Avg Revenue for Apr,May)-1]
for which I require who are active for 3 months(June,Apr and May) and also those who are present in Tab_active.
Considering the size of active customer records,using a join will degrade performance.
Is there a better way of doing this?
Thanks in advance.

I can think of one way to do this without a join, but I can't guarantee it'll be faster:
select sum(revJun) / (sum(RevMay)*0.5 + sum(RevApr)*0.5 - 1)
from (select CustNo, sum(revJun) as revJun, sum(revMay) as revMay, sum(revApr) as revApr,
count(*) as NumActiveMonths
from ((select Custno, Revenue, 'Now' as which, 0.0 as RevJun, 0.0 as RevMay, 0.0 as RevApr
from tab_active
) union all
(select Custno, Revenue, 'Jun', Revenue as RevJun, 0.0 as RevMay, 0.0 as RevApr
from tab_Jun
) union all
(select Custno, Revenue, 'May', 0.0 as RevJun, Revenue as RevMay, 0.0 as RevApr
from tab_May
) union all
(select Custno, Revenue, 'Apr', 0.0 as RevJun, 0.0 as RevMay, Revenue as RevApr
from tab_Apr
)
) t
group by CustNo
) t
where NumActiveMonths = 4
I don't know that this will be faster. You'll have to experiment.

Related

Querying revenue by percentile in googlesql

I'm trying to group companies and their revenues by percentiles (>90% as Top, 50-90% as middle, < 50% as bottom, in googlesql.
Table format for revenue_table:
|company | product | revenue |
------------------------------
I'm trying out doing a cross join to split these companies up:
SELECT
company,
SUM(revenue) as revenue,
CASE
WHEN SUM(revenue) > Percentile90_Max THEN "Top"
WHEN SUM(revenue) >= Percentile50_Max THEN "Middle"
ELSE "Bottom"
END as percentile_section,
Percentile50_Max,
Percentile90_Max,
FROM revenue_table
CROSS JOIN
(SELECT
APPROX_QUANTILES(revenue,100)[offset(50)] As Percentile50_Max,
APPROX_QUANTILES(revenue,100)[offset(90)] As Percentile90_Max
FROM
(SELECT
company,
SUM(revenue) as revenue
FROM revenue_table
GROUP BY 1
)
)
GROUP BY 1,4,5
Order by 2 desc
The code above currently works, but gets broken once I change my main select statement to:
SELECT
company,
--SUM(revenue) as revenue,
CASE
WHEN SUM(revenue) > Percentile90_Max THEN "Top"
WHEN SUM(revenue) >= Percentile50_Max THEN "Middle"
ELSE "Bottom"
END as percentile_section,
--Percentile50_Max,
--Percentile90_Max,
... same code here
GROUP BY 1
Ideally my end result should just be Company + percentile_section.
How should I do this without doing another subquery? Or is subquery really the only way to go in terms of querying efficiency?
Thank you!
You should be able to calculate the quantiles as part of the aggregation, so no subquery should be necessary:
SELECT company, SUM(revenue) as revenue,
(CASE WHEN SUM(revenue) > (APPROX_QUANTILES(SUM(revenue), 100) OVER ())[offset(90)] THEN 'Top'
WHEN SUM(revenue) >= (APPROX_QUANTILES(SUM(revenue), 100) OVER ())[offset(50)] THEN 'Middle'
ELSE 'Bottom'
END) as percentile_section
FROM revenue_table
GROUP BY 1
Order by 2 desc

Postgresql - Aggregate queries inside aggregate queries

I'm working on building a select statement for a sales rep commission report that uses postgresql tables. I want it to show these columns:
-Customer No.
-Part No.
-Month-to-date Qty (MTD Qty)
-Year-to-date Qty (YTD Qty)
-Month-to-date Extended Selling Price (MTD Extended)
-Year-to-date Extended Selling Price (YTD Extended)
The data is in two tables:
Sales_History (one record per invoice and this table includes Cust. No. and Invoice Date)
Sales_History_Items (one record per part no. per invoice and this table includes Part No., Qty and Unit Price).
If I do a simple query that combines these two tables, this is what it looks like:
Date / Cust / Part / Qty / Unit Price
Apr 1 / ABC Co. / WIDGET / 5 / $11
Apr 4 / ABC Co. / WIDGET / 8 / $11.50
Apr 1 / ABC Co. / GADGET / 1 / $30
Apr 7 / XYZ Co. / WIDGET / 3 / $11.50
etc.
This is the final result I want (one line per customer per part):
Cust / Part / Qty / MTD Qty / MTD Sales / YTD Qty / YTD Sales
ABC Co. / WIDGET / 13 / $147 / 1500 / $16,975
ABC Co. / GADGET / 1 / $30 / 7 / $210
XYZ Co. / WIDGET / 3 / $34.50 / 18 / $203.40
I’ve been able to come up with this SQL statement so far, which does not get me the extended selling columns (committed_qty * unit_price) per line and then summarize them by cust no./part no., and that’s my problem:
with mtd as
(SELECT sales_history.cust_no, part_no, Sum(sales_history_items.committed_qty) AS MTDQty
FROM sales_history left JOIN sales_history_items
ON sales_history.invoice_no = sales_history_items.invoice_no where sales_history_items.part_no is not null and sales_history.invoice_date >= '2020-04-01' and sales_history.invoice_date <= '2020-04-30'
GROUP BY sales_history.cust_no, sales_history_items.part_no),
ytd as
(SELECT sales_history.cust_no, part_no, Sum(sales_history_items.committed_qty) AS YTDQty
FROM sales_history left JOIN sales_history_items
ON sales_history.invoice_no = sales_history_items.invoice_no where sales_history_items.part_no is not null and sales_history.invoice_date >= '2020-01-01' and sales_history.invoice_date <= '2020-12-31' GROUP BY sales_history.cust_no, sales_history_items.part_no),
mysummary as
(select MTDQty, YTDQty, coalesce(ytd.cust_no,mtd.cust_no) as cust_no,coalesce(ytd.part_no,mtd.part_no) as part_no
from ytd full outer join mtd on ytd.cust_no=mtd.cust_no and ytd.part_no=mtd.part_no)
select * from mysummary;
I believe that I have to nest another couple of aggregate queries in here that would group by cust_no, part_no, unit_price but then have those extended price totals (qty * unit_price) sum up by cust_no, part_no.
Any assistance would be greatly appreciated. Thanks!
Do this in one go with filter expressions:
with params as (
select '2020-01-01'::date as year, 4 as month
)
SELECT h.cust_no, i.part_no,
SUM(i.committed_qty) AS YTDQty,
SUM(i.committed_qty * i.unit_price) as YTDSales,
SUM(i.committed_qty) FILTER
(WHERE extract('month' from h.invoice_date) = p.month) as MTDQty,
SUM(i.committed_qty * i.unit_price) FILTER
(WHERE extract('month' from h.invoice_date) = p.month) as MTDSales
FROM params p
CROSS JOIN sales_history h
LEFT JOIN sales_history_items i
ON i.invoice_no = h.invoice_no
WHERE i.part_no is not null
AND h.invoice_date >= p.year
AND h.invoice_date < p.year + interval '1 year'
GROUP BY h.cust_no, i.part_no
If I follow you correctly, you can do conditional aggregation:
select sh.cust_no, shi.part_no,
sum(shi.qty) mtd_qty,
sum(shi.qty * shi.unit_price) ytd_sales,
sum(shi.qty) filter(where sh.invoice_date >= date_trunc('month', current_date) mtd_qty,
sum(shi.qty * shi.unit_price) filter(where sh.invoice_date >= date_trunc('month', current_date) mtd_sales
from sales_history sh
left join sales_history_items shi on sh.invoice_no = shi.invoice_no
where shi.part_no is not null and sh.invoice_date >= date_trunc('year', current_date)
group by sh.cust_no, shi.part_no
The logic is to filter on the current year, and use simple aggregation to compute the "year to date" figures. To get the "month to date" columns, we can just filter the aggregate functions.

SQL How to subtract from only the MIN value

My query is pretty simple:
SELECT SUM(rate), SUM(nights), SUM(rate) * SUM(nights) AS subtotal, (SUM(rate) * SUM(nights)) * MIN(tax) AS tax,
SUM(adults), SUM(fee), ((SUM(rate) * SUM(nights)) * MIN(tax)) + SUM(fee) AS total
FROM table
WHERE group = #group_code
GROUP BY rate
The results of my query:
rate nights subtotal tax adults fee total
0.00 14 0.00 0.00 21 105 105.00
154.00 226 34804.00 5373.04 141 705 40882.04
254.00 6 1524.00 235.27 4 20 1779.27
I want to be able to subtract 2 adults from whatever the MIN(rate) row, in this case the row with a rate of $0.00, but leave the other rows alone.
Anyone able to help?
If I understood properly this would do the trick:
with s as (
select
SUM(rate) rate,
SUM(nights) nights,
SUM(rate) * SUM(nights) subtotal,
(SUM(rate) * SUM(nights)) * MIN(tax) tax,
SUM(adults) adults,
SUM(fee) fee,
((SUM(rate) * SUM(nights)) * MIN(tax)) + SUM(fee) total,
from table
where group = #group_code
group by rate
), sorted as (
select s.rate, s.nights, s.subtotal, s.tax, s.adults, s.fee, s.total, ROW_NUMBER() over(order by s.rate) lp
from s
)
select sorted.rate,
sorted.nights,
sorted.subtotal,
sorted.tax,
sorted.adults + IIF(lp = 1, -2, 0),
sorted.fee,
sorted.total
from sorted

Cohort/ Retention query in BigQuery using Google Analytics exported data

I need help formulating a cohort/retention query
I am trying to build a query to look at visitors who performed ActionX on their first visit (in the time frame) and then how many days later they returned to perform Action X again
The output I (eventually) need looks like this...
The table I am dealing with is an export of Google Analytics to BigQuery
If anyone could help me with this or anyone who has written a query similar that I can manipulate?
Thanks
Just to give you simple idea / direction
Below is for BigQuery Standard SQL
#standardSQL
SELECT
Date_of_action_first_taken,
ROUND(100 * later_1_day / Visits) AS later_1_day,
ROUND(100 * later_2_days / Visits) AS later_2_days,
ROUND(100 * later_3_days / Visits) AS later_3_days
FROM `OutputFromQuery`
You can test it with below dummy data from your question
#standardSQL
WITH `OutputFromQuery` AS (
SELECT '01.07.17' AS Date_of_action_first_taken, 1000 AS Visits, 800 AS later_1_day, 400 AS later_2_days, 300 AS later_3_days UNION ALL
SELECT '02.07.17', 1000, 860, 780, 860 UNION ALL
SELECT '29.07.17', 1000, 780, 120, 0 UNION ALL
SELECT '30.07.17', 1000, 710, 0, 0
)
SELECT
Date_of_action_first_taken,
ROUND(100 * later_1_day / Visits) AS later_1_day,
ROUND(100 * later_2_days / Visits) AS later_2_days,
ROUND(100 * later_3_days / Visits) AS later_3_days
FROM `OutputFromQuery`
The OutputFromQuery data is as below:
Date_of_action_first_taken Visits later_1_day later_2_days later_3_days
01.07.17 1000 800 400 300
02.07.17 1000 860 780 860
29.07.17 1000 780 120 0
30.07.17 1000 710 0 0
and the final output is:
Date_of_action_first_taken later_1_day later_2_days later_3_days
01.07.17 80.0 40.0 30.0
02.07.17 90.0 78.0 86.0
29.07.17 80.0 12.0 0.0
30.07.17 70.0 0.0 0.0
I found this query on Turn Your App Data into Answers with Firebase and BigQuery (Google I/O'19)
It should work :)
#standardSQL
###################################################
# Part 1: Cohort of New Users Starting on DEC 24
###################################################
WITH
new_user_cohort AS (
SELECT DISTINCT
user_pseudo_id as new_user_id
FROM
`[your_project].[your_firebase_table].events_*`
WHERE
event_name = `[chosen_event] ` AND
#set the date from when starting cohort analysis
FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_TRUNC(TIMESTAMP_MICROS(event_timestamp), DAY, "Etc/GMT+1")) = '20191224' AND
_TABLE_SUFFIX BETWEEN '20191224' AND '20191230'
),
num_new_users AS (
SELECT count(*) as num_users_in_cohort FROM new_user_cohort
),
#############################################
# Part 2: Engaged users from Dec 24 cohort
#############################################
engaged_users_by_day AS (
SELECT
FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_TRUNC(TIMESTAMP_MICROS(event_timestamp), DAY, "Etc/GMT+1")) as event_day,
COUNT(DISTINCT user_pseudo_id) as num_engaged_users
FROM
`[your_project].[your_firebase_table].events_*`
INNER JOIN
new_user_cohort ON new_user_id = user_pseudo_id
WHERE
event_name = 'user_engagement' AND
_TABLE_SUFFIX BETWEEN '20191224' AND '20191230'
GROUP BY
event_day
)
####################################################################
# Part 3: Daily Retention = [Engaged Users / Total Users]
####################################################################
SELECT
event_day,
num_engaged_users,
num_users_in_cohort,
ROUND((num_engaged_users / num_users_in_cohort), 3) as retention_rate
FROM
engaged_users_by_day
CROSS JOIN
num_new_users
ORDER BY
event_day
So I think I may have cracked it... from this output I then would need to manipulate it (pivot table it) to make it look like the desired output.
Can anyone review this for me and let me know what you think?
`WITH
cohort_items AS (
SELECT
MIN( TIMESTAMP_TRUNC(TIMESTAMP_MICROS((visitStartTime*1000000 +
h.time*1000)), DAY) ) AS cohort_day, fullVisitorID
FROM
TABLE123 AS U,
UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN "20170701" AND "20170731"
AND 'ACTION TAKEN'
GROUP BY 2
),
user_activites AS (
SELECT
A.fullVisitorID,
DATE_DIFF(DATE(TIMESTAMP_TRUNC(TIMESTAMP_MICROS((visitStartTime*1000000 + h.time*1000)), DAY)), DATE(C.cohort_day), DAY) AS day_number
FROM `Table123` A
LEFT JOIN cohort_items C ON A.fullVisitorID = C.fullVisitorID,
UNNEST(hits) AS h
WHERE
A._TABLE_SUFFIX BETWEEN "20170701 AND "20170731"
AND 'ACTION TAKEN'
GROUP BY 1,2),
cohort_size AS (
SELECT
cohort_day,
count(1) as number_of_users
FROM
cohort_items
GROUP BY 1
ORDER BY 1
),
retention_table AS (
SELECT
C.cohort_day,
A.day_number,
COUNT(1) AS number_of_users
FROM
user_activites A
LEFT JOIN cohort_items C ON A.fullVisitorID = C.fullVisitorID
GROUP BY 1,2
)
SELECT
B.cohort_day,
S.number_of_users as total_users,
B.day_number,
B.number_of_users / S.number_of_users as percentage
FROM retention_table B
LEFT JOIN cohort_size S ON B.cohort_day = S.cohort_day
WHERE B.cohort_day IS NOT NULL
ORDER BY 1, 3
`
Thank you in advance!
If you use some techniques available in BigQuery, you can potentially solve this type of problem with very cost and performance effective solutions. As an example:
SELECT
init_date,
ARRAY((SELECT AS STRUCT days, freq, ROUND(freq * 100 / MAX(freq) OVER(), 2) FROM UNNEST(data) ORDER BY days)) data
FROM(
SELECT
init_date,
ARRAY_AGG(STRUCT(days, freq)) data
FROM(
SELECT
init_date,
data AS days,
COUNT(data) freq
FROM(
SELECT
init_date,
ARRAY(SELECT DATE_DIFF(PARSE_DATE("%Y%m%d", dts), PARSE_DATE("%Y%m%d", init_date), DAY) AS dt FROM UNNEST(dts) dts) data
FROM(
SELECT
MIN(date) init_date,
ARRAY_AGG(DISTINCT date) dts
FROM `Table123`
WHERE TRUE
AND EXISTS(SELECT 1 FROM UNNEST(hits) where eventinfo.eventCategory = 'recommendation') -- This is your 'ACTION TAKEN' filter
AND _TABLE_SUFFIX BETWEEN "20170724" AND "20170731"
GROUP BY fullvisitorid
)
),
UNNEST(data) data
GROUP BY init_date, days
)
GROUP BY init_date
)
I tested this query against our G.A data and selected customers who interacted with our recommendation system (as you can see in the filter selection WHERE EXISTS...). Example of result (omitted absolute values of freq for privacy reasons):
As you can see, at day 28th for instance, 8% of customers came back 1 day later and interacted with the system again.
I recommend you to play around with this query and see if it works well for you. It's simpler, cheaper, faster and hopefully easier to maintain.

SQL Year over year growth percentage from data same query

How do I calculate the percentage difference from 2 different columns, calculated in that same query? Is it even possible?
This is what I have right now:
SELECT
Year(OrderDate) AS [Year],
Count(OrderID) AS TotalOrders,
Sum(Invoice.TotalPrice) AS TotalRevenue
FROM
Invoice
INNER JOIN Order
ON Invoice.InvoiceID = Order.InvoiceID
GROUP BY Year(OrderDate);
Which produces this table
Now I'd like to add one more column with the YoY growth, so even when 2016 comes around, the growth should be there..
EDIT:
I should clarify that I'd like to have for example next to
2015,5,246.28 -> 346,15942029% ((R2015-R2014) / 2014 * 100)
If you save your existing query as qryBase, you can use it as the data source for another query to get what you want:
SELECT
q1.Year,
q1.TotalOrders,
q1.TotalRevenue,
IIf
(
q0.TotalRevenue Is Null,
Null,
((q1.TotalRevenue - q0.TotalRevenue) / q0.TotalRevenue) * 100
) AS YoY_growth
FROM
qryBase AS q1
LEFT JOIN qryBase AS q0
ON q1.Year = (q0.Year + 1);
Access may complain it "can't represent the join expression q1.Year = (q0.Year + 1) in Design View", but you can still edit the query in SQL View and it will work.
What you are looking for is something like this?
Year Revenue Growth
2014 55
2015 246 4.47
2016 350 1.42
You could wrap the original query a twice to get the number from both years.
select orders.year, orders.orders, orders.revenue,
(select (orders.revenue/subOrders.revenue)
from
(
--originalQuery or table link
) subOrders
where subOrders.year = (orders.year-1)
) as lastYear
from
(
--originalQuery or table link
) orders
here's a cheap union'd table example.
select orders.year, orders.orders, orders.revenue,
(select (orders.revenue/subOrders.revenue)
from
(
select 2014 as year, 2 as orders, 55.20 as revenue
union select 2015 as year, 2 as orders, 246.28 as revenue
union select 2016 as year, 7 as orders, 350.47 as revenue
) subOrders
where subOrders.year = (orders.year-1)
) as lastYear
from
(
select 2014 as year, 2 as orders, 55.20 as revenue
union select 2015 as year, 2 as orders, 246.28 as revenue
union select 2016 as year, 7 as orders, 350.47 as revenue
) orders