How to Calculate Percentages for Groups in SQL - sql

I have a table that looks something like this
Class ID Value
A 1 300
A 2 200
A 3 500
B 1 300
B 2 300
C 1 1000
Is there a way of using SQL to calculate the percentage share each ID has to the class.
For example, the percentages for class A would be 30% to id 1, 20% to ID 2, and 50% to id 3 and so on for the other classes:
Class ID Value Percentage
A 1 300 30%
A 2 200 20%
A 3 500 50%
B 1 300 50%
B 2 300 50%
C 1 1000 100%

You can use window functions (if your database, which you did not disclose, supports them):
select
t.*,
1.0 * value / sum(value) over(partition by class) ratio
from mytable t
This gives you a ratio, that is a value between 0 and 1 - I find that this is more relevant than a percentage, but you can multiply that by 100 if you like.

Related

How to calculate leftovers of each balance top-up using first in first out technique?

Imagine we have user balances. There's a table with top-up and withdrawals. Let's call it balance_updates.
transaction_id
user_id
current_balance
amount
created_at
1
1
100
100
...
2
1
0
-100
3
2
400
400
4
2
300
-100
5
2
200
-200
6
2
300
100
7
2
50
-50
What I want to get off this is a list of top-ups and their leftovers using the first in first out technique for each user.
So the result could be this
top_up
user_id
leftover
1
1
0
3
2
50
6
2
100
Honestly, I struggle to turn it to SQL. Tho I know how to do it on paper. Got any ideas?

Get average of rows group by value intervals

I have a table as follows:
ID | Value
1 5
1 1000
1 1500
2 1000
2 1800
3 40
3 1000
3 1200
3 2000
3 2500
I want to obtain the average of each ID groupped by a given range r of value. For instance, if in this case r=1000, The expected result would be:
ID | Value
1 5
1 1250
2 1400
3 40
3 1100
3 2250
I have seen that this can be done with time intervals as seen here. My question is, how can I perform this type of group by operation for integer/float types?
You could try this way:
SELECT id, avg(value) as AvgValue
FROM (SELECT id, value, ROUND(value/1000, 0) AS range FROM yourtable) t
GROUP BY id, range

Pandas - Converting columns in percentage based on first columns value

There is a data frame with totals and counts:
pd.DataFrame({
'categorie':['a','b','c'],
'total':[100,1000,500],
'x':[10,100,5],
'y':[100,1000,500]
})
categorie
total
x
y
a
100
10
100
b
1000
100
1000
c
500
5
500
I like to convert the counted columns into percentage based on the totals:
categorie
total
x%
y%
a
100
10
100
b
1000
10
100
c
500
1
100
Following will work for a series:
(100 * df['x'] / df['total']).round(1)
How to apply this for all columns in the data frame?
try via div(),mul() and astype() method:
df[['x%','y%']]=df[['x','y']].div(df['total'],axis=0).mul(100).astype(int)
output of df:
categorie total x y x% y%
0 a 100 10 100 10 100
1 b 1000 100 1000 10 100
2 c 500 5 500 1 100

Creating 2 "cartridges" of cumulative sum with conditions using SQL

I need to create 2 cumulative sums based on the value type, for example:
I have values of incoming stock units from 2 types: A and B. and I also have records of outgoing stock units.
If we have enough stock of type "A" it should taken out of type A, if not- it should be taken out of type B. so basically I need to crate the columns "A stock" and "B stock" below, representing the current balance of each type.
I tried using cumulative sum but I'm having trouble with the condition... is there a way to write this query without using a loop ? ( Vertica DB)
In table below A_stock and B_stock are the final result I need to create
ID Type In OUT A stock B stock Order_id
1 A 100 0 100 0 1
1 B 50 0 100 50 2
1 A 100 0 200 50 3
1 - 0 -200 0 50 4
1 - 0 -10 0 40 5
1 B 50 0 0 90 6
1 A 40 0 40 90 7
1 - 0 -20 20 90 8
2 A 30 0 30 0 1
2 B 20 0 30 20 2
2 A 10 0 40 20 3
2 - 0 -20 20 20 4
You can use window functions - but you need a column that defines the ordering of the rows, I assumed ordering_id:
select t.*,
sum(case when type = 'A' then in + out else 0 end) over(partition by id order by ordering_id) a_stock,
sum(case when type = 'B' then in + out else 0 end) over(partition by id order by ordering_id) b_stock
from mytable t
This assumes that you want the stock on a per-id basis; if that's not the case, just remove the partition clause from the over() clause.

Group by and provide groups only if unique in group

i have the Following dataset :
Amount Document Number
0 200 12345
1 90 2222
2 200 456789
3 90 4444
4 300 4789
5 300 4789
So basically i want to get group numbers for the above data (using ngroup maybe)
Grouping the data on the basis of amount. assign a group number to one group only if the Document numbers in that group has unique numbers.
This is what i would like the outcome to be.
Amount Document Number Group
0 200 12345 1
1 90 2222 2
2 200 456789 1
3 90 4444 2
4 300 4789
5 300 4789
Grouping the data on the basis of amount. assign the rows to one group only if the Document number is a unique number.
I think you want rank():
select t.*, rank() over (order by amount, document_number) as grouping
from t;
In pandas, you could first create a mask where any group by amount has a dup is flagged as False with groupby.transform and duplicated, then use this mask and groupby.ngroup like:
mask_dup = ~(df.duplicated().groupby(df['Amount']).transform(any))
df.loc[mask_dup, 'Group'] = df[mask_dup].groupby('Amount').ngroup()+1
print (df)
Amount Document Number Group
0 200 12345 2.0
1 90 2222 1.0
2 200 456789 2.0
3 90 4444 1.0
4 300 4789 NaN
5 300 4789 NaN
if you have more than the two columns at first you need to specify the subset in duplicated