How to interchange rows and columns in panda dataframe - pandas

i m reading a csv file using pandas library. I want to interchange rows and columns but main issue is in Status column ..there is repetition of values after every three rows in this column....so transpose is making all the row values to columns...but in place i just want only three column...i.e. Confirmed, Recovered, Deceased for every date.. please find the attachment where i have shown sample input as well as sample output.
enter image description here

It's a case of using stack() and unstack()
import random
s = 10
d = pd.date_range("01-Jan-2021", periods=s)
cols = ["TT","AN","AP"]
df = pd.DataFrame([{**{"Date":dd, "Status":st}, **{c:random.randint(1,50) for c in cols}}
for dd in d
for st in ["Confirmed","Recovered","Deceased"]])
df.set_index(["Date","Status"]).rename_axis(columns="State").stack().unstack(1)
before
Date Status TT AN AP
0 2021-01-01 Confirmed 5 44 17
1 2021-01-01 Recovered 44 5 48
2 2021-01-01 Deceased 27 3 24
3 2021-01-02 Confirmed 33 14 38
4 2021-01-02 Recovered 21 15 6
5 2021-01-02 Deceased 15 37 8
6 2021-01-03 Confirmed 15 20 36
7 2021-01-03 Recovered 18 19 44
8 2021-01-03 Deceased 37 22 1
9 2021-01-04 Confirmed 16 35 37
10 2021-01-04 Recovered 30 45 49
11 2021-01-04 Deceased 35 7 18
after
Status Confirmed Deceased Recovered
Date State
2021-01-01 TT 5 27 44
AN 44 3 5
AP 17 24 48
2021-01-02 TT 33 15 21
AN 14 37 15
AP 38 8 6
2021-01-03 TT 15 37 18
AN 20 22 19
AP 36 1 44
2021-01-04 TT 16 35 30
AN 35 7 45
AP 37 18 49

Related

How can i get a aggregate sum of Average number of product given between two weeks and output for Each week as shown below in Pandas?

StartWeek
End Week
Numberof Week
Number of Product
Avg number of product per week
39
41
3
99
33
40
45
5
150
30
40
42
3
60
20
39
40
2
40
20
39
41
3
99
33
So that the output looks like --
Week
Sum Average Product per week
39
86
40
70
41
66
42
20
45
30
First for each row we create a list of weeks that it applies to, and put in column 'weeks'
df['weeks'] = df.apply(lambda r: np.arange(r['StartWeek'], r['EndWeek']+1),axis=1)
df looks like this
StartWeek EndWeek NumberofWeek NumberofProduct Av weeks
-- ----------- --------- -------------- ----------------- ---- -------------------
0 39 41 3 99 33 [39 40 41]
1 40 45 5 150 30 [40 41 42 43 44 45]
2 40 42 3 60 20 [40 41 42]
3 39 40 2 40 20 [39 40]
4 39 41 3 99 33 [39 40 41]
Then we explode weeks which duplicates each row for each week it is applied to, and then aggregate by the exploded week and sum:
df.explode('weeks').groupby('weeks', as_index = False)['Av'].sum()
output:
weeks Av
-- ------- ----
0 39 86
1 40 136
2 41 116
3 42 50
4 43 30
5 44 30
6 45 30
you can use the group by method in python
df=df.groupby(["StartWeek"])["Avg number of product per week"].sum()

How to calculate shift and rolling sum over missing dates without adding them to data frame in Pandas?

I have a data set with dates, customers and income:
Date CustomerIncome
0 1/1/2018 A 53
1 2/1/2018 A 36
2 3/1/2018 A 53
3 5/1/2018 A 89
4 6/1/2018 A 84
5 8/1/2018 A 84
6 9/1/2018 A 54
7 10/1/2018 A 19
8 11/1/2018 A 44
9 12/1/2018 A 80
10 1/1/2018 B 24
11 2/1/2018 B 100
12 9/1/2018 B 40
13 10/1/2018 B 47
14 12/1/2018 B 10
15 2/1/2019 B 5
For both customers there are missing dates as they purchased nothing at some months.
I want to add per each customer what was the income of last month and also the rolling sum of income for the last year.
Meaning, if there's a missing month, I'll see '0' at the shift(1) column of the following month that has income. And I'll see rolling sum of 12 months even if there weren't 12 observations.
This is the expected result:
Date CustomerIncome S(1)R(12)
0 1/1/2018 A 53 0 53
1 2/1/2018 A 36 53 89
2 3/1/2018 A 53 36 142
3 5/1/2018 A 89 0 231
4 6/1/2018 A 84 89 315
5 8/1/2018 A 84 0 399
6 9/1/2018 A 54 84 453
7 10/1/2018 A 19 54 472
8 11/1/2018 A 44 19 516
9 12/1/2018 A 80 44 596
10 1/1/2018 B 24 0 24
11 2/1/2018 B 100 24 124
12 9/1/2018 B 40 0 164
13 10/1/2018 B 47 40 211
14 12/1/2018 B 10 0 221
15 2/1/2019 B 5 0 102
So far, I've added the rows with missing dates with stack and unstack, but with multiple dates and customers, it explodes the data to millions of rows, crashing kernel with most rows are 0's.
You can use .shift but have logic that if the gap is > 31 days, then make (S1) = 0
The rolling 12 calculation requires figuring out the "Rolling Date" and doing some complicated list comprehension to decide whether or not to return a value. Then, take a sum of each list per row.
df['Date'] = pd.to_datetime(df['Date']).dt.date
df['S(1)'] = df.groupby('Customer')['Income'].transform('shift').fillna(0)
s = (df['Date'] - df['Date'].shift())/np.timedelta64(1, '31D') <= 1
df['S(1)'] = df['S(1)'].where(s,0).astype(int)
df['Rolling Date'] = (df['Date'] - pd.Timedelta('1Y'))
df['R(12)'] = df.apply(lambda d: sum([z for x,y,z in
zip(df['Customer'], df['Date'], df['Income'])
if y > d['Rolling Date']
if y <= d['Date']
if x == d['Customer']]), axis=1)
df = df.drop('Rolling Date', axis=1)
df
Out[1]:
Date Customer Income S(1) R(12)
0 2018-01-01 A 53 0 53
1 2018-02-01 A 36 53 89
2 2018-03-01 A 53 36 142
3 2018-05-01 A 89 0 231
4 2018-06-01 A 84 89 315
5 2018-08-01 A 84 0 399
6 2018-09-01 A 54 84 453
7 2018-10-01 A 19 54 472
8 2018-11-01 A 44 19 516
9 2018-12-01 A 80 44 596
10 2018-01-01 B 24 0 24
11 2018-02-01 B 100 24 124
12 2018-09-01 B 40 0 164
13 2018-10-01 B 47 40 211
14 2018-12-01 B 10 0 221
15 2019-02-01 B 5 0 102

How to replace last n values of a row with zero

I want to replace last 2 values of one of the column with zero. I understand for NaN values, I am able to use .fillna(0), but I would like to replace row 6 value of the last column as well.
Weight Name Age d_id_max
0 45 Sam 14 2
1 88 Andrea 25 1
2 56 Alex 55 1
3 15 Robin 8 3
4 71 Kia 21 3
5 44 Sia 43 2
6 54 Ryan 45 1
7 34 Dimi 65 NaN
df.drop(df.tail(2).index,inplace=True)
Weight Name Age d_id_max
0 45 Sam 14 2
1 88 Andrea 25 1
2 56 Alex 55 1
3 15 Robin 8 3
4 71 Kia 21 3
5 44 Sia 43 2
6 54 Ryan 45 0
7 34 Dimi 65 0
Before pandas 0.20.0 (long time) it was job for ix, but now it is deprecated. So you can use:
DataFrame.iloc for get last rows and also Index.get_loc for positions of column d_id_max:
df.iloc[-2:, df.columns.get_loc('d_id_max')] = 0
print (df)
Weight Name Age d_id_max
0 45 Sam 14 2.0
1 88 Andrea 25 1.0
2 56 Alex 55 1.0
3 15 Robin 8 3.0
4 71 Kia 21 3.0
5 44 Sia 43 2.0
6 54 Ryan 45 0.0
7 34 Dimi 65 0.0
Or DataFrame.loc with indexing index values:
df.loc[df.index[-2:], 'd_id_max'] = 0
Try .iloc and get_loc
df.iloc[[-1,-2], df.columns.get_loc('d_id_max')] = 0
Out[232]:
Weight Name Age d_id_max
0 45 Sam 14 2.0
1 88 Andrea 25 1.0
2 56 Alex 55 1.0
3 15 Robin 8 3.0
4 71 Kia 21 3.0
5 44 Sia 43 2.0
6 54 Ryan 45 0.0
7 34 Dimi 65 0.0
You can use:
df['d_id_max'].iloc[-2:] = 0
Weight Name Age d_id_max
0 45 Sam 14 2.0
1 88 Andrea 25 1.0
2 56 Alex 55 1.0
3 15 Robin 8 3.0
4 71 Kia 21 3.0
5 44 Sia 43 2.0
6 54 Ryan 45 0.0
7 34 Dimi 65 0.0

Pandas Dataframe Merging

I have a bit of a weird pandas question.
I have a master Dataframe:
a b c
0 22 44 55
1 22 45 22
2 44 23 56
3 45 22 33
I then have a dataframe in a different dimension which has some over lapping index's and column names
index col_name new_value
0 a 111
3 b 234
I'm trying to then say if you find a match on index and col_name in the master dataframe, then replace the value.
So the output would be
a b c
0 111 44 55
1 22 45 22
2 44 23 56
3 45 234 33
I've found "Combine_first" but this doesn't work unless I pivot the second dataframe (which I can't do in this scenario)
This is update problem
df.update(updated.pivot(*updated.columns))
df
Out[479]:
a b c
0 111.0 44.0 55
1 22.0 45.0 22
2 44.0 23.0 56
3 45.0 234.0 33
Or
df.values[updated['index'].values,df.columns.get_indexer(updated.col_name)]=updated.new_value.values
df
Out[495]:
a b c
0 111 44 55
1 22 45 22
2 44 23 56
3 45 234 33

How to divide a result set into equal parts?

I have a table new_table
ID PROC_ID DEP_ID OLD_STAFF NEW_STAFF
1 15 43 58 ?
2 19 43 58 ?
3 29 43 58 ?
4 31 43 58 ?
5 35 43 58 ?
6 37 43 58 ?
7 38 43 58 ?
8 39 43 58 ?
9 58 43 58 ?
10 79 43 58 ?
How I can select all proc_ids and update new_staff, for example
ID PROC_ID DEP_ID OLD_STAFF NEW_STAFF
1 15 43 58 15
2 19 43 58 15
3 29 43 58 15
4 31 43 58 15
5 35 43 58 23
6 37 43 58 23
7 38 43 58 23
8 39 43 58 28
9 58 43 58 28
10 79 43 58 28
15 - 4(proc_id)
23 - 3(proc_id)
28 - 3(proc_id)
58 - is busi
where 15, 23, 28 and 58 staffs in one dep
"how to divide equal parts"
Oracle has a function, ntile() which splits a result set into equal buckets. For instance this query puts your posted data into four buckets:
SQL> select id
2 , proc_id
3 , ntile(4) over (order by id asc) as gen_staff
4 from new_table;
ID PROC_ID GEN_STAFF
---------- ---------- ----------
1 15 1
2 19 1
3 29 1
4 31 2
5 35 2
6 37 2
7 38 3
8 39 3
9 58 4
10 79 4
10 rows selected.
SQL>
This isn't quite the solution you want but you need to clarify your requirements before it's possible to provide a complete answer.
update new_table
set new_staff='15'
where ID in('1','2','3','4')
update new_table
set new_staff='28'
where ID in('8','9','10')
update new_table
set new_staff='23'
where ID in('5','6','7')
Not sure if this is what you mean.