SQL calculate percentage between two columns

SQL calculate percentage between two columns - sql

I would like to calculate the percentage between opened and delivered items by month. I have the following table:
date | delivered | opened
01/04/2021 1 1
01/04/2021 1
01/04/2021 1
08/05/2021 1 1
08/05/2021 1 1
10/03/2021 1
10/03/2021 1 1
The percentage would then be added like this:
date_month | delivered | opened | percentage_opened
4 1 1 0.33
4 1 0.33
4 1 0.33
5 1 1 1
5 1 1 1
3 1 0.5
3 1 1 0.5
I have tried the following, but get an error reading 'Internal error: system tried to run table creation for virtual table'.
select
opened,
delivered,
month(date) as date_month,
sum(opened)/sum(delivered) over(partition by month(date)) as percentage_opened
from table
;

You are close but you need two analytic functions. You should also include the year:
select opened, delivered, month(date) as date_month,
(sum(opened) over (partition by year(date), month(date)) * 1.0 /
sum(delivered) over(partition by year(date), month(date))
) as ratio_opened
from table;
Some databases do integer division, so I threw in * 1.0 just in case yours does.

Related

Tag islands of records

I have this data set:
Id
PrevId
NextId
Product
Process
Date
1
NULL
4
Product 1
Process A
2021-04-24
2
NULL
3
Product 2
Process A
2021-04-24
3
2
5
Product 2
Process A
2021-04-24
4
1
7
Product 1
Process B
2021-04-26
5
3
6
Product 2
Process B
2021-04-24
6
5
NULL
Product 2
Process B
2021-04-24
7
4
9
Product 1
Process B
2021-04-29
9
7
10
Product 1
Process A
2021-05-01
10
9
15
Product 1
Process A
2021-05-03
15
10
19
Product 1
Process A
2021-05-04
19
15
NULL
Product 1
Process C
2021-05-05
Per product, I need to tag consecutive/islands of records that have the same Process like:
Id
PrevId
NextId
Product
Process
Date
Tag
1
NULL
4
Product 1
Process A
2021-04-24
1
4
1
7
Product 1
Process B
2021-04-26
2
7
4
9
Product 1
Process B
2021-04-29
2
9
7
10
Product 1
Process A
2021-05-01
3
10
9
15
Product 1
Process A
2021-05-03
3
15
10
19
Product 1
Process A
2021-05-04
3
19
15
NULL
Product 1
Process C
2021-05-05
4
A product goes through multiple Process-es and can go through the same one more than one time.
I basically need to produce the Tag column, the logic behind it is consecutive records with the same Process should be grouped together but a caveat is that the same process can appear further down the line but should be treated as a new group.
I have tried the basic windowing functions (ROW_NUMBER and DENSE_RANK) but the problem is that those count within the partition and not across partitions.

You can use lag() to determine where the values are the same. Then a cumulative sum :
select t.*,
1 + sum(case when process = prev_process then 0 else 1 end) over (partition by producct order by id) as tag
from (select t.*,
lag(process) over (partition by product order by id) as prev_process
from t
) t;
Here is a db<>fiddle.

If you don't have to validate prevId and nextId (that is if your data is already correctly ordered) you could try the following:
WITH cte AS(
SELECT *
, ROW_NUMBER() OVER (PARTITION BY Product ORDER BY [Date]) x
, DENSE_RANK() OVER (PARTITION BY Product, Process ORDER BY [Date]) y
FROM T1
WHERE product = 'Product 1'
),
cteTag AS(
SELECT Id, PrevId, NextId, Product, Process, [Date], x-y AS Tag_
FROM cte
)
SELECT Id, PrevId, NextId, Product, Process, [Date], DENSE_RANK() OVER (PARTITION BY Product ORDER BY Tag_) AS Tag
FROM cteTag
ORDER BY [Date]

Retrieving last record in each group from database with order by

There is a table ticket that contains data as shown below:
Id Impact group create_date
------------------------------------------
1 3 ABC 2020-07-28 00:42:00.0
1 2 ABC 2020-07-28 00:45:00.0
1 3 ABC 2020-07-28 00:48:00.0
1 3 ABC 2020-07-28 00:52:00.0
1 3 XYZ 2020-07-28 00:55:00.0
1 3 XYZ 2020-07-28 00:59:00.0
Expected result:
Id Impact group create_date
------------------------------------------
1 3 ABC 2020-07-28 00:42:00.0
1 2 ABC 2020-07-28 00:45:00.0
1 3 ABC 2020-07-28 00:52:00.0
1 3 XYZ 2020-07-28 00:59:00.0
At present, this is the query that I use:
WITH final AS (
SELECT p.*,
ROW_NUMBER() OVER(PARTITION BY p.id,p.group,p.impact
ORDER BY p.create_date desc, p.impact) AS rk
FROM ticket p
)
SELECT f.*
FROM final f
WHERE f.rk = 1
Result, i am getting is:
Id Impact group create_date
-----------------------------------------
1 2 ABC 2020-07-28 00:45:00.0
1 3 ABC 2020-07-28 00:52:00.0
1 3 XYZ 2020-07-28 00:59:00.0
it seems that partition by is getting precedence over order by values. is there other way to achieve expected result. I am running these queries on amazon Redshift.

You could use LEAD() to check if the Impact changes between rows, taking only the rows where the value will change.
WITH
look_forward AS
(
SELECT
*,
LEAD(impact) OVER (PARTITION BY id, group ORDER BY create_date) AS lead_impact
FROM
ticket
)
SELECT
*
FROM
look_forward
WHERE
lead_impact IS NULL
OR lead_impact <> impact

You seem to want rows where id/impact/group change relative to the next row. A simple way is to look at the next create_date overall and the next create_date for the group. If these are the same, then filter:
select t.*
from (select t.*,
lead(create_date) over (order by create_date) as next_create_date,
lead(create_date) over (partition by id, impact, group order by create_date) as next_create_date_img
from ticket t
) t
where next_create_date_img is null or next_create_date_img <> next_create_date;

Does Oracle allow to do a sum over a partition but only when it obeys certain conditions, otherwise use a lag?

So my company has an application that has a certain "in-app currency". We record every transaction.
Recently, we found out there was a bug running for a couple of weeks that allowed users to spend currency in a certain place, even when they had none. When this happened, users wouldn't get charged at all: e.g. User had 4 m.u. and bought something that's worth 10 m.u. it's balance would remain at 4.
Now we need to find out who abused it and what's their available balance.
I want to get the column BUG_ABUSE and WISHFUL_CUMMULATIVE that reflect the illegitimate transactions and the amount that our users really see in their in-app wallets but I'm running out of ideas of how to get there.
I was wondering if I could do something like a sum(estrelas) if result over 0 else lag over (partition by user order by date) or something of the likes to get the wishful cummulative.
We're using oracle. Any help is highly appreciated
User_ID
EVENT_DATE
AMOUNT
DIRECTION
RK
CUM
WISHFUL_CUMMULATIVE
BUG_ABUSE
1
02/01/2021 13:37:19,009000
-5
0
1
-5
0
1
1
08/01/2021 01:55:40,000000
40
1
2
35
40
0
1
10/01/2021 10:45:41,000000
2
1
3
37
42
0
1
10/01/2021 10:45:58,000000
2
1
4
39
44
0
1
10/01/2021 13:47:37,456000
-5
0
5
34
39
0
2
13/01/2021 20:09:59,000000
2
1
1
2
2
0
2
16/01/2021 15:14:54,000000
-50
0
2
-48
2
1
2
19/01/2021 02:02:59,730000
-5
0
3
-53
2
1
2
23/01/2021 21:14:40,000000
3
1
4
-50
5
0
2
23/01/2021 21:14:50,000000
-5
0
5
-55
0
0

Here's something you can try. This uses recursive subquery factoring (recursive WITH clause), so it will only work in Oracle 11.2 and higher.
I use columns USER_ID, EVENT_DATE and AMOUNT from your inputs. I assume all three columns are constrained NOT NULL, two events can't have exactly the same timestamp for the same user, and AMOUNT is negative for purchases and other debits (fees, etc.) and positive for deposits or other credits.
The input data looks like this:
select user_id, event_date, amount
from sample_data
order by user_id, event_date
;
USER_ID EVENT_DATE AMOUNT
------- ----------------------------- ------
1 02/01/2021 13:37:19,009000000 -5
1 08/01/2021 01:55:40,000000000 40
1 10/01/2021 10:45:41,000000000 2
1 10/01/2021 10:45:58,000000000 2
1 10/01/2021 13:47:37,456000000 -5
2 13/01/2021 20:09:59,000000000 2
2 16/01/2021 15:14:54,000000000 -50
2 19/01/2021 02:02:59,730000000 -5
2 23/01/2021 21:14:40,000000000 3
2 23/01/2021 21:14:50,000000000 -5
Perhaps your input data has additional columns (like cumulative amount, which I left out because it plays no role in the problem or its solution). You show a RK column - I assume you computed it as a step in your attempt to solve the problem; I re-create it in my solution below.
Here is what you can do with a recursive query (recursive WITH clause):
with
p (user_id, event_date, amount, rk) as (
select user_id, event_date, amount,
row_number() over (partition by user_id order by event_date)
from sample_data
)
, r (user_id, event_date, amount, rk, bug_flag, balance) as (
select user_id, event_date, amount, rk,
case when amount < 0 then 'bug' end, greatest(amount, 0)
from p
where rk = 1
union all
select p.user_id, p.event_date, p.amount, p.rk,
case when p.amount + r.balance < 0 then 'bug' end,
r.balance + case when r.balance + p.amount >= 0
then p.amount else 0 end
from p join r on p.user_id = r.user_id and p.rk = r.rk + 1
)
select *
from r
order by user_id, event_date
;
Output:
USER_ID EVENT_DATE AMOUNT RK BUG BALANCE
------- ----------------------------- ------ -- --- -------
1 02/01/2021 13:37:19,009000000 -5 1 bug 0
1 08/01/2021 01:55:40,000000000 40 2 40
1 10/01/2021 10:45:41,000000000 2 3 42
1 10/01/2021 10:45:58,000000000 2 4 44
1 10/01/2021 13:47:37,456000000 -5 5 39
2 13/01/2021 20:09:59,000000000 2 1 2
2 16/01/2021 15:14:54,000000000 -50 2 bug 2
2 19/01/2021 02:02:59,730000000 -5 3 bug 2
2 23/01/2021 21:14:40,000000000 3 4 5
2 23/01/2021 21:14:50,000000000 -5 5 0

In order to produce the result you want you'll probably want to process the rows sequentially: once the first row is processed for a user you'll compute the second one, then the third one, and so on.
Assuming the column RK is already computed and sequential for each user you can do:
with
n (user_id, event_date, amount, direction, rk, cum, wishful, bug_abuse) as (
select t.*,
greatest(amount, 0),
case when amount < 0 then 1 else 0 end
from t where rk = 1
union all
select
t.user_id, t.event_date, t.amount, t.direction, t.rk, t.cum,
greatest(n.wishful + t.amount, 0),
case when n.wishful + t.amount < 0 then n.wishful
else n.wishful + t.amount
end
from n
join t on t.user_id = n.user_id and t.rk = n.rk + 1
)
select *
from n
order by user_id, rk;

Finding Percentage Via Query

I have a question regarding SQL.
Say I have the following table:
customerID | time_secs
-----------+-----------
1 | 5
1 | 4
1 | 2
2 | 1
2 | 3
3 | 6
3 | 8
I can't change the table design. I want to be able to calculate for each unique customer, the percent of time_secs that is over 3.
So for example, for customer 1, it would be (2 / 3) * 100 %.
I've gotten this so far:
SELECT customerID, COUNT(time_secs)
WHERE time_secs > 3
GROUP BY service
How do I make sure the time_secs is above 3 and also divides it by the total count of time_secs regardless if it's above 3 or not.
Thanks.

A simple method is conditional aggregation:
select customerid,
avg(case when time_seconds > 3 then 100.0 else 0 end) as ratio
from t
group by customerid;
The avg() is a convenient shorthand for:
sum(case when time_seconds > 3 then 100.0 else 0 end) / count(*)

SQL: SELECT value for all rows based on a value in one of the rows and a condition

I have a list of total store visits for a customer for a month. The customer has a home store but can visit other stores. Like the table below:
MemberId | HomeStoreId | VisitedStoreId | Month | Visits
1 5 5 1 5
1 5 3 1 2
1 5 2 1 1
1 5 4 1 7
I want my select statement to give the number of visits to the home store against each store for that member for that month. Like the below:
MemberId | HomeStoreId | VisitedStoreId | Month | Visits | HomeStoreVisits
1 5 5 1 5 5
1 5 3 1 2 5
1 5 2 1 1 5
1 5 4 1 7 5
I've looked at a SUM with CASE statements inside and OVER with PARTITION but I can't seem to work it out.
Thanks

I would use window functions:
select t.*,
sum(case when homestoreid = visitedstoreid then visits end) over
(partition by memberid, month) as homestorevisits
from t;

SELECT MemberID,HomestoreID,visitedstoreid,Month,visits, homestorevisits
FROM Table LEFT OUTER JOIN
(SELECT MemberID, Visits homestorevisits
FROM TABLE WHERE homestoreID =VisitedStoreId
)T ON T.MemberID = Table.MemberID

You can achieve this using a simple subquery.
SELECT MemberId, HomeStoreID, VisitedStoreID, Month, Visits,
(SELECT Visits FROM table t2
WHERE t2.MemberId = t1.MemberId
AND t2.HomeStoreId = t1.HomeStoreId
AND t2.Month = t1.Month
AND t2.VisitedStoreId = t2.HomeStoreId) AS HomeStoreVisits
FROM table t1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL calculate percentage between two columns - sql

Related

Tag islands of records

Retrieving last record in each group from database with order by

Does Oracle allow to do a sum over a partition but only when it obeys certain conditions, otherwise use a lag?

Finding Percentage Via Query

SQL: SELECT value for all rows based on a value in one of the rows and a condition

Categories

Resources