Finding Conversion Rate - sql

I have two column tables with subscription result for each project. I need to know the conversion rate for each experiment.
My tables are:
And my desired result is
conversion is any user for whom there is a subscription start event in addition to the trial start event (all users have a trial start event). If a user is in multiple experiments at the same time, it’s ok to count them towards the conversion rate of each experiment, also want to only return one row per experiment.
I have done a inner join to get subscription event and experiment id together but able to process further. Any help is appreciated/
My inner join table looks like

I think this is a left join with some filtering and aggregation:
select e.experiment_id,
(count(case when experiment_assignment = 'test' then s.user_id end) * 1.0 /
sum(case when experiment_assignment = 'test' then 1 else 0 end)
) as test_conversion_rate,
(count(case when experiment_assignment = 'control' then s.user_id end) * 1.0 /
sum(case when experiment_assignment = 'control' then 1 else 0 end)
) as control_conversion_rate
from experiment e left join
subscription s
on s.user_id = e.user_id and
s.subscription_event = 'subscription_start'
group by e.experiment_id;

select experiment_id,
(count(case when experiment_assignment = 'test' and subscription_event = “subscriptipon_start” then user_id end)*100 /
count(case when experiment_assignment = 'test' and subscription_event = “trial_start” then user_id )
) as test_conversion_rate,
(count(case when experiment_assignment = 'control' and subscription_event = “subscriptipon_start” then user_id end) * 100 /
count(case when experiment_assignment = 'control' and subscription_event = “trial_start” then user_id end)
) as control_conversion_rate
from experiment inner join subscription
on user_id
group by experiment_id;

Related

Joining two aggregate queries from the same table - SQL Server

I have two queries, both are aggregated from the same table. I'm not sure if I have to join these two queries together or if it can be done with one select statement. The goal is to output a table that aggregates total charges and total refunds for each student.
Query #1:
select
s.learners_id, sum(charge.total_amount) charge_amount
from
fact_student_transactions_t charge
inner join
dim_students_t s on s.students_sk_id = charge.students_sk_id
left join
object_statuses_t os on os.object_statusid = charge.transaction_status_id
where
os.status_name = 'Success'
and charge.tran_type = 'CHARGE'
and charge.curr_in = 1
group by
s.learners_id
Query #2:
select
s.learners_id, sum(refund.total_amount) refund_amount
from
fact_student_transactions_t refund
inner join
dim_students_t s on s.students_sk_id = refund.students_sk_id
left join
object_statuses_t os on os.object_statusid = refund.transaction_status_id
where
os.status_name = 'Success'
and refund.tran_type = 'Refund'
and refund.trans_description not in ('Amount Successfully Transfered to Prepaid Balance.', 'Amount Successfully Transfered.')
and refund.payment_method != 'Transfer'
and refund.curr_in = 1
group by
s.learners_id
You can use conditional aggregation. Also, because of the nature of the where clause, the left join is unnecessary -- the unmatched records are filtered out anyway.
So:
select s.learners_id,
sum(case when st.tran_type = 'CHARGE' then st.total_amount else 0 end) as charge_amount,
sum(case when st.tran_type = 'Refund' and
st.trans_description not in ('Amount Successfully Transfered to Prepaid Balance.', 'Amount Successfully Transfered.')
and
st.payment_method <> 'Transfer'
then st.total_amount else 0
end) as refund_amount
from fact_student_transactions_t st join
dim_students_t s
on s.students_sk_id = t.students_sk_id join
object_statuses_t os
on os.object_statusid = t.transaction_status_id
where os.status_name = 'Success' and
st.curr_in = 1 and
st.tran_type in ('CHARGE', 'Refund')
group by s.learners_id

Order by sequence number in case - possibility

I got a little bit complicated report in sql for my HR dep. and one of fields is that if someone got social insurance.
a must have case for it ( if some assumptions are met then view Yes otherwise No ) the problem is that people are employyed more then one time and not always want to have social insurance.
the data looks like:
https://fojlww.am.files.1drv.com/y4mawp7Xs7HahMkb_h_bP7xuD_UIHAfDF3dvZ1iPLD5kGMFjHpcJpEDUD3g8TyNmT5mxgZbU6LLJPhWGivDtZEy8i4e3lz32jMBIB7yw5MzRO4U5PiGdoWtbIT02Qdk-9_eGfxTUgkGcE-g4JNQ80C6TK2PSUmlIzBlTmo99knOJotmSnLbqJevrF5CB3jmdKtLVGEfdDY4dLHoCcnWWZV10Q?width=1306&height=320&cropmode=none
(CASE WHEN (select s.emp_no from SSI_DATA s where
s.C_NO_CONTRIBUTIONS = 'FALSE' and OBLIG_PENSION_INSUR = 'TRUE' and s.emp_no = cp.emp_no and s.company_id = cp.company_id and
(s.C_NO_CONTRIB_SINCE <= '2019-10-10' or s.C_NO_CONTRIB_SINCE is null or s.C_NO_CONTRIB_SINCE = 'FALSE' ) and rownum = 1
<p>order by s.seq desc</p>
--and s.seq = ( select max(seq) from SSI_DATA where emp_no = cp.emp_no and company_id = cp.company_id)
) = cp.emp_no
THEN 'Y'
ELSE 'N' END)

Multi Column CASE WHEN in Teradata

I'm trying to consolidate a few sub-queries to avoid hitting a massive table (42B rows) multiple times and getting
"[3771] Illegal Expression in WHEN clause of CASE expression."
,SUM(CASE
WHEN (oh.LOCN_NBR,oh.WK_NBR) IN (SELECT LOCN_NBR,START_WK FROM VT_STORES)
THEN oh.TTL_UN_QT
ELSE NULL
END) AS BEGINNING_OH
Is there any way to do multi-column IN statements within a CASE statement, or am I stuck putting these in the join/where in a subquery as it is currently?
Edit: Full Query as requested:
SELECT
oh.LOCN_NBR AS LOCN_NBR
,item.ITEM_ID AS ITEM_ID
,SUM(CASE
WHEN oh.WK_NBR = (SELECT WK_NBR FROM ALEX_ARP_VIEWS_PRD.REF_CUSTOM_TIME WHERE cust_time_id=2 )
THEN oh.TTL_UN_QT
ELSE NULL
END) AS SALEABLE_QTY
,SUM(CASE
WHEN oh.WK_NBR = (SELECT LY_WK_NBR FROM ALEX_ARP_VIEWS_PRD.REF_CUSTOM_TIME WHERE cust_time_id=2 )
THEN oh.TTL_UN_QT
ELSE NULL
END) AS SALEABLE_QTY_LY
,SUM(CASE
WHEN (oh.LOCN_NBR,oh.WK_NBR) IN (SELECT LOCN_NBR,PRI_START_WK FROM VT_STORES)
THEN oh.TTL_UN_QT
ELSE NULL
END) AS BEGINNING_OH_LY
,SUM(CASE
WHEN (oh.LOCN_NBR,oh.WK_NBR) IN (SELECT LOCN_NBR,START_WK FROM VT_STORES)
THEN oh.TTL_UN_QT
ELSE NULL
END) AS BEGINNING_OH
FROM
ALEX_ARP_VIEWS_PRD.FACT_WKLY_OPR_INS oh
INNER JOIN VT_STORES stores ON oh.LOCN_NBR = stores.LOCN_NBR
INNER JOIN VT_ITEM item ON oh.VEND_PACK_ID = item.VEND_PACK_ID
WHERE
INS_TYP_CD='H'
AND TTL_UN_QT <> 0
AND WK_NBR >= (SELECT MIN(PRI_START_WK) FROM VT_STORES)
GROUP BY
oh.LOCN_NBR
,item.ITEM_ID
You don't need to use IN. You can use exists:
SUM(CASE WHEN EXISTS (SELECT 1 FROM VT_STORES v WHERE oh.LOCN_NBR = v.LOCN_NBR AND oh.WK_NBR = v.START_WK)
THEN oh.TTL_UN_QT
END) AS BEGINNING_OH
However, I'm not 100% sure the problem is the IN. Many databases do not allow subqueries as arguments to aggregation functions. I'm not sure if Teradata allows this functionality.
I'm trying to consolidate a few sub-queries to avoid hitting a massive table multiple times
Multi-column Subqueries are only allowed in WHERE, but not in CASE. Rewriting to EXISTS will not probably not improve performance, the plan might actually be more complex.
You simply try hiding complexity, but it's still an Outer Join in the background, like this:
,Sum(CASE WHEN stores.LOCN_NBR IS NOT NULL THEN oh.TTL_UN_QT END) AS BEGINNING_OH
...
FROM oh LEFT JOIN
(
SELECT LOCN_NBR,START_WK FROM VT_STORES
) AS stores
ON stores.LOCN_NBR = oh.LOCN_NBR
AND stores.START_WK = oh.WK_NBR
Can you show your current query (at least those parts you try to optimize)?
Edit:
When VT_STORES.locn_nbr is unique this should return the same result:
SELECT
oh.LOCN_NBR AS LOCN_NBR
,item.ITEM_ID AS ITEM_ID
,Sum(CASE
WHEN oh.WK_NBR = (SELECT Min(WK_NBR) FROM ALEX_ARP_VIEWS_PRD.REF_CUSTOM_TIME WHERE cust_time_id=2 )
THEN oh.TTL_UN_QT
ELSE NULL
END) AS SALEABLE_QTY
,Sum(CASE
WHEN oh.WK_NBR = (SELECT Min(LY_WK_NBR) FROM ALEX_ARP_VIEWS_PRD.REF_CUSTOM_TIME WHERE cust_time_id=2 )
THEN oh.TTL_UN_QT
ELSE NULL
END) AS SALEABLE_QTY_LY
,Sum(CASE
WHEN oh.WK_NBR= stores.PRI_START_WK
THEN oh.TTL_UN_QT
END) AS BEGINNING_OH_LY
,Sum(CASE
WHEN oh.WK_NBR = stores.START_WK
THEN oh.TTL_UN_QT
END) AS BEGINNING_OH
FROM
ALEX_ARP_VIEWS_PRD.FACT_WKLY_OPR_INS oh
INNER JOIN VT_STORES stores ON oh.LOCN_NBR = stores.LOCN_NBR
INNER JOIN VT_ITEM item ON oh.VEND_PACK_ID = item.VEND_PACK_ID
WHERE
INS_TYP_CD='H'
AND TTL_UN_QT <> 0
AND WK_NBR >= (SELECT Min(PRI_START_WK) FROM VT_STORES)
GROUP BY
oh.LOCN_NBR
,item.ITEM_ID
The other Scalar Subqueries should be fine, because they return only a single row and the optimizer knows that.
re: Is there any way to do multi-column IN statements within a CASE statement
no.
re: to avoid hitting a massive table (42B rows) multiple times
which one is large? I don't see how, even if a compound IN clause worked, you would avoid hitting a large table multiple times compared to using a join/adding columns to a where clause.
have you tried something like:
SELECT
...
,SUM(CASE
WHEN SELECT COUNT(*) FROM VT_STORES
WHERE LOCN_NBR = oh.LOCN_NBR AND
PRI_START_WK = oh.WK_NBR > 1
THEN oh.TTL_UN_QT
END) AS BEGINNING_OH_LY
,SUM(CASE
WHEN SELECT COUNT(*) FROM VT_STORES
WHERE LOCN_NBR = oh.LOCN_NBR AND
START_WK = oh.WK_NBR > 1
THEN oh.TTL_UN_QT
END) AS BEGINNING_OH
FROM
...
?
If it's VT_STORES that's large, try:
ALEX_ARP_VIEWS_PRD.FACT_WKLY_OPR_INS oh INNER JOIN
(select
LOCN_NBR,
PRI_START_WK,
START_WK
from
VT_STORES) stores ON
oh.LOCN_NBR = stores.LOCN_NBR and
(PRI_START_WK = oh.WK_NBR OR
START_WK = oh.WK_NBR)
although I've seen performance hits from using OR clauses in joins, you might be better off with two inner joins, one for each week you want.

how to include several count functions and group byes in one line

I am trying to get the number of customers by their types and groups all in line as such:
GroupName | GroupNotes | Count(Type1) | Count(Type2) | Count(Type3)
but instead I can only get the groupid ,the typeid and the number of types in the group by using the following query
SELECT
CustomersGroups.idCustomerGroup , Customers.type , COUNT(*)
FROM
CustomersGroups
inner Join CustomersInGroup on CustomersGroups.idCustomerGroup = CustomersInGroup.idCustomerGroup
inner Join Customers on Customers.idCustomer = CustomersInGroup.idCustomer
Group by
CustomersGroups.idCustomerGroup, Customers.type
is there a way to show them in a single line , (and show the name of the group?)
This is a "pivot" query. Some databases directly support pivot syntax. In all, you can use conditional aggregation.
Perhaps more importantly, you should learn to use table aliases. These make queries easier to write and to read:
select cg.idCustomerGroup,
sum(case when c.type = 'Type1' then 1 else 0 end) as num_type1,
sum(case when c.type = 'Type2' then 1 else 0 end) as num_type2,
sum(case when c.type = 'Type3' then 1 else 0 end) as num_type3
from CustomersGroups cg inner Join
CustomersInGroup cig
on cg.idCustomerGroup = cig.idCustomerGroup inner Join
Customers c
on c.idCustomer = cig.idCustomer
Group by cg.idCustomerGroup;

To compute sum regarding to a constraint

I'm using PostgreSQL 8.4.
I have the following sql-query:
SELECT p.partner_id,
CASE WHEN pa.currency_id = 1 THEN SUM(amount) ELSE 0 END AS curUsdAmount,
CASE WHEN pa.currency_id = 2 THEN SUM(amount) ELSE 0 END AS curRubAmount,
CASE WHEN pa.currency_id = 3 THEN SUM(amount) ELSE 0 END AS curUahAmount
FROM public.player_account AS pa
JOIN player AS p ON p.id = pa.player_id
WHERE p.partner_id IN (819)
GROUP BY p.partner_id, pa.currency_id
The thing is that query does not what I expected. I realize that, but now I want to understand what exactly that query does. I mean, what SUM will be counted after the query executed. Could you clarify?
I think you have the conditions backwards in the query:
SELECT p.partner_id,
SUM(CASE WHEN pa.currency_id = 1 THEN amount ELSE 0 END) AS curUsdAmount,
SUM(CASE WHEN pa.currency_id = 2 THEN amount ELSE 0 END) AS curRubAmount,
SUM(CASE WHEN pa.currency_id = 3 THEN amount ELSE 0 END) AS curUahAmount
FROM public.player_account pa JOIN
player p
ON p.id = pa.player_id
WHERE p.partner_id IN (819)
GROUP BY p.partner_id;
Note that I also removed currency_id from the group by clause.
Maybe one row per (partner_id, currency_id) does the job. Faster and cleaner that way:
SELECT p.partner_id, pa.currency_id, sum(amount) AS sum_amount
FROM player_account pa
JOIN player p ON p.id = pa.player_id
WHERE p.partner_id = 819
AND pa.currency_id IN (1,2,3) -- may be redundant if there are not other
GROUP BY 1, 2;
If you need 1 row per partner_id, you are actually looking for "cross-tabulation" or a "pivot table". In Postgres use crosstab() from the additional module tablefunc , which is very fast. (Also available for the outdated version 8.4):
SELECT * FROM crosstab(
'SELECT p.partner_id, pa.currency_id, sum(amount)
FROM player_account pa
JOIN player p ON p.id = pa.player_id
WHERE p.partner_id = 819
AND pa.currency_id IN (1,2,3)
GROUP BY 1, 2
ORDER BY 1, 2'
,VALUES (1), (2), (3)'
) AS t (partner_id int, "curUsdAmount" numeric
, "curRubAmount" numeric
, "curUahAmount" numeric); -- guessing data types
Adapt to your actual data types.
Detailed explanation:
PostgreSQL Crosstab Query