SQL: calculation on two columns with multiple group by statements - sql

I have a table which has the following columns:
user_id - includes duplicates
product_id - includes duplicates
purchases - number of purchases of given product_id
My table looks somewhat like this:
user_id date product_id purchases
0 1 1 1 4
1 1 2 1 0
2 1 3 2 0
3 1 4 2 0
4 2 1 1 1
5 2 2 1 0
6 2 3 1 1
7 3 1 2 0
8 3 2 3 0
9 4 1 5 1
My goal is to calculate the following metric:
% of products that were purchased at least once, grouped by user
For example: user 1 had 2 products, one of them got purchased at least once, the other one did not get purchased at all. So the metric would be the number of products that got purchased at least once / number of all products per user: 1/2 * 100 = 50%
I have little SQL experience so I do not have any legitimate code that could be corrected.
My desired output would be like this:
user_id total_products products_with_purchases metric
0 1 2 1 50%
1 2 1 1 100%
2 3 2 0 0%
3 4 1 1 100%
I would appreciate seeing a good practice solution to this problem. Many thanks!

select
user_id,
count(distinct product_id) as total_products,
count(distinct case when purchases > 0 then product_id end) as products_with_purchases,
100.00 * count(distinct case when purchases > 0 then product_id end)
/ count(distinct product_id) as metric
from T as t
group by user_id
https://rextester.com/EDSY39439

You can do this all in one query but this is the type of situation where it is easier to understand with sub-queries -- sql optimizer should make it fast.
select
user_id,
total_products,
products_with_purchase,
(products_with_purchase / total_products) * 100 as metric
from (
select -- group by user to get totals
user_id,
count(product_id) as total_products,
sum(case when purchases > 0 then 1 else 0 end) as products_with_purchase
from ( -- group by user and product and get purchase items
SELECT user_id, product_id, sum(purchases) as purchases
FROM table
GROUP BY user_id, product_id
) X
group by user_id
) X2

I Am Mohit Sahni
you can solve the above problem with the below SQL Code:
select
user_id,
count(distinct product_id) as total_products,
sum(case when purchases = 0 then 0 else 1 end) as products_with_purchases,
((sum(case when purchases = 0 then 0 else 1 end))/count(distinct product_id))*100 as metric
from
table
group by
user_id

Related

sql snowflake, aggregate over window or sth

I have a table below
days
balance
user_id
wanted column
2022/08/01
10
1
1
2022/08/02
11
1
1
2022/08/03
10
1
1
2022/08/03
0
2
1
2022/08/05
3
2
2
2022/08/06
3
2
2
2022/08/07
3
3
3
2022/08/08
0
2
3
since I'm new to SQL couldn't aggregate over window by clauses, correctly.
which means; I want to find unique users that have balance>0 per day.
thanks
update:
exact output wanted:
days
unque users
2022/08/01
1
2022/08/02
1
2022/08/03
1
2022/08/05
2
2022/08/06
2
2022/08/07
3
2022/08/08
3
update: how if I want to accumulate the number of unique users over time? with consideration of new users [means: counting users who didn't exist before], and the balance > 0
everyones help is appreaciated deeply :)
SELECT
*,
COUNT(DISTINCT CASE WHEN balance > 0 THEN USER_ID END) OVER (ORDER BY days)
FROM
your_table

Show two different sum columns based on a single column

Show two different sum columns based on another column.
For this table:
ID Item Quantity Location
1 1 10 A
2 1 10 B
3 1 10 A
4 2 10 A
5 2 10 A
6 2 10 B
7 3 10 A
8 3 20 A
I need to see the total quantities for both location A and location B (to compare which is higher), but only for items that have a location B:
Expected result:
Item Quantity A Quantity B
1 20 10
2 20 10
I've been trying this but getting errors:
SELECT st.item, st.qty ALIAS(stqty),
(SELECT SUM(dc.qty)
FROM table dc
WHERE st.item = dc.item) ALIAS(dcqty))
FROM table st
WHERE location ='b'
I can do this easily with two queries obviously, but I was hoping for a way to do it in one.
you can use a sum with case statement to do your pivot then a having to exclude rows with no total for b
here is the fiddle
https://www.db-fiddle.com/f/rS8fgvWoFxn879Utc2CKbu/0
select Item,
sum(case when Location = 'A' then Quantity else 0 end),
sum(case when Location = 'B' then Quantity else 0 end)
from myTable
group by Item
having sum(case when Location = 'B' then Quantity else 0 end) > 0

SQL Group by Sales Rep - Select 2 counts

I would like to query a table that has leads assigned by sales rep to return the unique number of leads grouped by agent and also the number sold. There can be multiple leads from one buyer, I would like to select distinct so each buyer is counted only once. Here is the layout of the data:
AgentId
BuyerEmail
Product
Category
1
lisa#gmail.com
Jeans
1
1
lisa#gmail.com
Hat
1
1
ryan#gmail.com
Shoes
3
2
mark#gmail.com
Jeans
1
2
mark#gmail.com
Socks
1
2
mark#gmail.com
Hat
1
4
john#gmail.com
Shirt
3
5
lou#gmail.com
Hat
3
5
tim#gmail.com
Shirt
3
I would like to return a dataset like the following:
AgentId
UniqueLeads
QtySold
1
2
1
2
1
0
4
1
1
5
2
2
I can query this individually but I can't get it to return in one result set. Here are the 2 separate queries:
SELECT COUNT(DISTINCT BuyerEmail) FROM SalesLeads GROUP BY InitialAgent
SELECT COUNT(DISTINCT BuyerEmail) FROM SalesLeads WHERE Category = 3 GROUP BY InitialAgent
How can I query the table and have both data points return in one result set? Please note, a category = 3 means it is sold.
You can use conditional aggregation to calculate QtySold in the same statement:
select AgentId,
count(distinct BuyerEmail) as UniqueLeads,
count(case when Category = 3 then Category end) as QtySold
from SalesLeads
group by AgentId
When Category is anything other than 3 the case statement returns null so that record isn't counted in the QtySold calculation.
db<>fiddle

Does Oracle allow to do a sum over a partition but only when it obeys certain conditions, otherwise use a lag?

So my company has an application that has a certain "in-app currency". We record every transaction.
Recently, we found out there was a bug running for a couple of weeks that allowed users to spend currency in a certain place, even when they had none. When this happened, users wouldn't get charged at all: e.g. User had 4 m.u. and bought something that's worth 10 m.u. it's balance would remain at 4.
Now we need to find out who abused it and what's their available balance.
I want to get the column BUG_ABUSE and WISHFUL_CUMMULATIVE that reflect the illegitimate transactions and the amount that our users really see in their in-app wallets but I'm running out of ideas of how to get there.
I was wondering if I could do something like a sum(estrelas) if result over 0 else lag over (partition by user order by date) or something of the likes to get the wishful cummulative.
We're using oracle. Any help is highly appreciated
User_ID
EVENT_DATE
AMOUNT
DIRECTION
RK
CUM
WISHFUL_CUMMULATIVE
BUG_ABUSE
1
02/01/2021 13:37:19,009000
-5
0
1
-5
0
1
1
08/01/2021 01:55:40,000000
40
1
2
35
40
0
1
10/01/2021 10:45:41,000000
2
1
3
37
42
0
1
10/01/2021 10:45:58,000000
2
1
4
39
44
0
1
10/01/2021 13:47:37,456000
-5
0
5
34
39
0
2
13/01/2021 20:09:59,000000
2
1
1
2
2
0
2
16/01/2021 15:14:54,000000
-50
0
2
-48
2
1
2
19/01/2021 02:02:59,730000
-5
0
3
-53
2
1
2
23/01/2021 21:14:40,000000
3
1
4
-50
5
0
2
23/01/2021 21:14:50,000000
-5
0
5
-55
0
0
Here's something you can try. This uses recursive subquery factoring (recursive WITH clause), so it will only work in Oracle 11.2 and higher.
I use columns USER_ID, EVENT_DATE and AMOUNT from your inputs. I assume all three columns are constrained NOT NULL, two events can't have exactly the same timestamp for the same user, and AMOUNT is negative for purchases and other debits (fees, etc.) and positive for deposits or other credits.
The input data looks like this:
select user_id, event_date, amount
from sample_data
order by user_id, event_date
;
USER_ID EVENT_DATE AMOUNT
------- ----------------------------- ------
1 02/01/2021 13:37:19,009000000 -5
1 08/01/2021 01:55:40,000000000 40
1 10/01/2021 10:45:41,000000000 2
1 10/01/2021 10:45:58,000000000 2
1 10/01/2021 13:47:37,456000000 -5
2 13/01/2021 20:09:59,000000000 2
2 16/01/2021 15:14:54,000000000 -50
2 19/01/2021 02:02:59,730000000 -5
2 23/01/2021 21:14:40,000000000 3
2 23/01/2021 21:14:50,000000000 -5
Perhaps your input data has additional columns (like cumulative amount, which I left out because it plays no role in the problem or its solution). You show a RK column - I assume you computed it as a step in your attempt to solve the problem; I re-create it in my solution below.
Here is what you can do with a recursive query (recursive WITH clause):
with
p (user_id, event_date, amount, rk) as (
select user_id, event_date, amount,
row_number() over (partition by user_id order by event_date)
from sample_data
)
, r (user_id, event_date, amount, rk, bug_flag, balance) as (
select user_id, event_date, amount, rk,
case when amount < 0 then 'bug' end, greatest(amount, 0)
from p
where rk = 1
union all
select p.user_id, p.event_date, p.amount, p.rk,
case when p.amount + r.balance < 0 then 'bug' end,
r.balance + case when r.balance + p.amount >= 0
then p.amount else 0 end
from p join r on p.user_id = r.user_id and p.rk = r.rk + 1
)
select *
from r
order by user_id, event_date
;
Output:
USER_ID EVENT_DATE AMOUNT RK BUG BALANCE
------- ----------------------------- ------ -- --- -------
1 02/01/2021 13:37:19,009000000 -5 1 bug 0
1 08/01/2021 01:55:40,000000000 40 2 40
1 10/01/2021 10:45:41,000000000 2 3 42
1 10/01/2021 10:45:58,000000000 2 4 44
1 10/01/2021 13:47:37,456000000 -5 5 39
2 13/01/2021 20:09:59,000000000 2 1 2
2 16/01/2021 15:14:54,000000000 -50 2 bug 2
2 19/01/2021 02:02:59,730000000 -5 3 bug 2
2 23/01/2021 21:14:40,000000000 3 4 5
2 23/01/2021 21:14:50,000000000 -5 5 0
In order to produce the result you want you'll probably want to process the rows sequentially: once the first row is processed for a user you'll compute the second one, then the third one, and so on.
Assuming the column RK is already computed and sequential for each user you can do:
with
n (user_id, event_date, amount, direction, rk, cum, wishful, bug_abuse) as (
select t.*,
greatest(amount, 0),
case when amount < 0 then 1 else 0 end
from t where rk = 1
union all
select
t.user_id, t.event_date, t.amount, t.direction, t.rk, t.cum,
greatest(n.wishful + t.amount, 0),
case when n.wishful + t.amount < 0 then n.wishful
else n.wishful + t.amount
end
from n
join t on t.user_id = n.user_id and t.rk = n.rk + 1
)
select *
from n
order by user_id, rk;

Finding Percentage Via Query

I have a question regarding SQL.
Say I have the following table:
customerID | time_secs
-----------+-----------
1 | 5
1 | 4
1 | 2
2 | 1
2 | 3
3 | 6
3 | 8
I can't change the table design. I want to be able to calculate for each unique customer, the percent of time_secs that is over 3.
So for example, for customer 1, it would be (2 / 3) * 100 %.
I've gotten this so far:
SELECT customerID, COUNT(time_secs)
WHERE time_secs > 3
GROUP BY service
How do I make sure the time_secs is above 3 and also divides it by the total count of time_secs regardless if it's above 3 or not.
Thanks.
A simple method is conditional aggregation:
select customerid,
avg(case when time_seconds > 3 then 100.0 else 0 end) as ratio
from t
group by customerid;
The avg() is a convenient shorthand for:
sum(case when time_seconds > 3 then 100.0 else 0 end) / count(*)