Sum of group but keep the same value for each row in pandas - pandas

How to solve same problem in this link Sum of group but keep the same value for each row in r using pandas?
I can generate separate df have the sum for each group and then merge the generated df with the original.

You can use groupby & transform as below to get your output.
df['sumx']=df.groupby(['ID', 'Group'],sort=False)['x'].transform(sum)
df['sumy']=df.groupby(['ID', 'Group'],sort=False)['y'].transform(sum)
df
output
ID Group x y sumx sumy
1 1 1 1 12 3 25
2 1 1 2 13 3 25
3 1 2 3 14 3 14
4 3 1 4 15 15 48
5 3 1 5 16 15 48
6 3 1 6 17 15 48
7 3 2 7 18 15 37
8 3 2 8 19 15 37
9 4 1 9 20 30 63
10 4 1 10 21 30 63
11 4 1 11 22 30 63
12 4 2 12 23 12 23

Related

Unpivot a data-frame that has information of two teams in one row?

I have some data that holds information about two opposing teams
home_x away_x
0 7 28
1 11 10
2 11 20
3 12 15
4 12 16
I know about .melt(), which returns something like this:
variable value
0 home_x 7
1 home_x 11
2 home_x 11
3 home_x 12
4 home_x 12
So each value is a row here.
There are several attributes for each team.
I want each row to have all the attributes for the respective team( home or away)
The ultimate goals is to have all the attributes of both teams in one row. This would double the number of rows.
home_x away_x
0 7 28
would be transformed into:
team1_x team2_x
0 7 28
0 28 7
sample df:
home_x
away_x
home_y
away_y
0
7
28
7
20
1
28
7
28
13
2
28
7
28
4
3
7
28
7
58
4
11
10
11
10
try:
res = pd.DataFrame()
for c in df.columns.str.split("_").str[1].unique():
p1 = df.filter(regex=f"{c}$")
c1,c2 =p1.columns
df_map = {c1:c2, c2:c1}
swap = p1.rename(columns={**df_map})
res = pd.concat([res,p1.append(swap).sort_index(ignore_index=True)], axis=1)
then rename the columns.
import re
repl = {'home': 'team1', 'away': 'team2'}
res.columns = [re.sub('|'.join(repl.keys()), lambda x: repl[x.group()], i) for i in res.columns]
team1_x
team2_x
team1_y
team2_y
0
7
28
7
20
1
28
7
20
7
2
28
7
28
13
3
7
28
13
28
4
28
7
28
4
5
7
28
4
28
6
7
28
7
58
7
28
7
58
7
8
11
10
11
10
9
10
11
10
11
Here is an approach:
You might need to group on the last split of the column names and then group on axis=1, then iterate through the groups and reverse the column order and name them same with the suffix:
def myinfo(data):
c = data.columns.str.split("_").str[-1]
f = lambda x: pd.DataFrame.set_axis(x, ["team1","team2"],axis=1)
l = [pd.concat([*map(f , (v,v.iloc[:,::-1]))]).add_suffix(f"_{k}")
for k,v in data.groupby(c,axis=1)]
return pd.concat(l,axis=1).sort_index()
print(myinfo(df))
team1_x team2_x
0 7 28
0 28 7
1 11 10
1 10 11
2 11 20
2 20 11
3 12 15
3 15 12
4 12 16
4 16 12

Keep only the first value on duplicated column (set 0 to others)

Supposing I have the following situation:
A dataframe where the first column ['ID'] will eventually have duplicated values.
import pandas as pd
df = pd.DataFrame({"ID": [1,2,3,4,4,5,5,5,6,6],
"l_1": [10,12,32,45,45,20,20,20,20,20],
"l_2": [11,12,32,11,21,27,38,12,9,6],
"l_3": [5,9,32,12,21,21,18,12,8,1],
"l_4": [6,21,12,77,77,2,2,2,8,8]})
ID l_1 l_2 l_3 l_4
1 10 11 5 6
2 12 12 9 21
3 32 32 32 12
4 45 11 12 77
4 45 21 21 77
5 20 27 21 2
5 20 38 18 2
5 20 12 12 2
6 20 9 8 8
6 20 6 1 8
When duplicated IDs occurs:
I need to keep only the first values for column l_1 and l_4 (other duplicated rows must be zero).
Columns 'l_2' and 'l_3' must stay the same.
When duplicated IDs occurs, the values on these rows on columns l_1 and l_4 will be also duplicated.
Expected output:
ID l_1 l_2 l_3 l_4
1 10 11 5 6
2 12 12 9 21
3 32 32 32 12
4 45 11 12 77
4 0 21 21 0
5 20 27 21 2
5 0 38 18 0
5 0 12 12 0
6 20 9 8 8
6 0 6 1 0
Is there a Straightforward way using pandas or numpy to accomplish this ?
I could just accomplish it doing all these steps:
x1 = df[df.duplicated(subset=['ID'], keep=False)].copy()
x1.loc[x1.groupby('ID')['l_1'].apply(lambda x: (x.shift(1) == x)), 'l_1'] = 0
x1.loc[x1.groupby('ID')['l_4'].apply(lambda x: (x.shift(1) == x)), 'l_4'] = 0
df = df.drop_duplicates(subset=['ID'], keep=False)
df = pd.concat([df, x1])
Isn't this just:
df.loc[df.duplicated('ID'), ['l_1','l_4']] = 0
Output:
ID l_1 l_2 l_3 l_4
0 1 10 11 5 6
1 2 12 12 9 21
2 3 32 32 32 12
3 4 45 11 12 77
4 4 0 21 21 0
5 5 20 27 21 2
6 5 0 38 18 0
7 5 0 12 12 0
8 6 20 9 8 8
9 6 0 6 1 0

Removing rows and keeping consecutive rows pandas

I would like to omit the first row and keep x consecutive rows.
in the example below i would like to keep 7. How do i achieve this?
df = pd.Series(range(1,101)).to_frame()
df.columns = ['numbers']
df['numbers'][1::7]
1 2
8 9
15 16
22 23
29 30
36 37
43 44
50 51
57 58
64 65
71 72
78 79
85 86
92 93
99 100
I would like to keep the values below but continue to the next row sequence.
so remove 1 then keep 2 to 7. then remove 8 and keep 9 to 14
df = pd.Series(range(1,101)).to_frame()
df.columns = ['numbers']
df['numbers'][1:7]
1 2
2 3
3 4
4 5
5 6
6 7
Or loc:
df.loc[df.index % 7 != 0]
giving
numbers
1 2
2 3
3 4
4 5
5 6
6 7
8 9
9 10
10 11
11 12
12 13
13 14
15 16
16 17
... ...
drop
df.drop(df.index[::7])
numbers
1 2
2 3
3 4
4 5
5 6
6 7
8 9
9 10
10 11
11 12
12 13
13 14
15 16
16 17
17 18
18 19
.. ...

Select for running total rank based on column values

I have problem while assigning the Ranks for the below scenarios.In my scenario running total calculated based on the Cnt field.
My sql query should return Rank values like below output. Per page it should accept only 40 rows, so im assigning ranks contain only 40 records. If the running total crossing 40 it should change ranks. For each count 40 it should change the rank values.
It would great help if I can get sql query to return values
select f1,f2,sum(f2) over(order by f1) runnign_total
from [dbo].[Sheet1$]
OutPut:
ID cnt Running Total Rank
1 4 4 1
2 5 9 1
3 4 13 1
4 4 17 1
5 4 21 1
6 5 26 1
7 4 30 1
8 4 34 1
9 4 38 1
10 4 42 2
11 4 46 2
12 4 50 2
13 4 54 2
14 4 58 2
15 4 62 2
16 4 66 2
17 4 70 2
18 4 74 2
19 4 78 2
20 4 82 3
21 4 86 3
22 4 90 3
select f1,f2,sum(f2) over(order by f1) running_total, Floor(sum(f2) over(order by f1) / 40) [rank]
from [dbo].[Sheet1$]

Update Query in SQL with numeric pattern in MS Access

Good Day All,
I need assistance in an creating an update query that groups my data.
The data in my table is actually spatial in nature and can be thought of a matrix that is 10 columns by 5 rows. I have the ObjectID, Row and Column but I want the column DesiredResult which is a 2x2 grouping of the rows & columns.
So the R,Cs of 1,1 1,2, 2,1 and 2,2, will have a DesiredResult of 1 while the 1,3 1,4 2,3 2,4 will have a DesiredResult of 2 and so on (see below for an example) ....
I was able to create the R and C columns using a combination of Quotient & Mod so I assume I would do somethign similar but I am stuck. How would I go about this query in MS Access ?
ObjectID R C DesiredResult
1 1 1 1
2 1 2 1
3 1 3 2
4 1 4 2
5 1 5 3
6 1 6 3
7 1 7 4
8 1 8 4
9 1 9 5
10 1 10 5
11 2 1 1
12 2 2 1
13 2 3 2
14 2 4 2
15 2 5 3
16 2 6 3
17 2 7 4
18 2 8 4
19 2 9 5
20 2 10 5
21 3 1 6
22 3 2 6
23 3 3 7
24 3 4 7
25 3 5 8
26 3 6 8
27 3 7 9
28 3 8 9
29 3 9 10
30 3 10 10
31 4 1 6
32 4 2 6
33 4 3 7
34 4 4 7
35 4 5 8
36 4 6 8
37 4 7 9
38 4 8 9
39 4 9 10
40 4 10 10
41 5 1 11
42 5 2 11
43 5 3 12
44 5 4 12
45 5 5 13
46 5 6 13
47 5 7 14
48 5 8 14
49 5 9 15
50 5 10 15
Something like ... ?
SELECT a.Row, a.Col, Col\2 AS D1, Col Mod 2 AS D2, [D1]+[D2] AS Desired
FROM table AS a
ORDER BY a.Row, a.Col;
Remou had a close approximation but it turns out this gives me what I need. I needed both a row and a column index.
SELECT ObjectID, R, C,
Int(([C]-1)/2) AS ColIndex,
Int(([R]-1)/2) AS RowIndex,
[RowIndex]*5+[ColIndex]+1 AS DesiredResult
FROM Testing
ORDER BY ObjectID
The key in the query is that there is the number 2 in both the Column & Row Index (which is the grouping size) and the number 5 is used in Desired Result and represents the Number of Row cells.
Thanks !