how to create a time series column and reindex in pandas? - pandas

How to create a column and reindex in pandas?
I'm a new pandas learner. I have 5 rows dataframe as follow:
A B C D
0 2.34 3.16 99.0 3.2
1 2.1 55.5 77.5 1
2 22.1 54 89 33
3 23 1.24 4.7 5
4 45 2.5 8.7 99
I want to replace index column 0,1...4 with new index 1 to 5. My expected output is:
A B C D
1 2.34 3.16 99.0 3.2
2 2.1 55.5 77.5 1
3 22.1 54 89 33
4 23 1.24 4.7 5
5 45 2.5 8.7 99
What I did is I create a new column:
new_index = pd.DataFrame({'#': range(1, 5 + 1 ,1)})
Then I tried to reindex:
df.reindex(new_index)
But I got error:
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
What should I do to reindex the former index? Thanks.

Use set_index
In [5081]: df.set_index([range(1, 6)])
Out[5081]:
A B C D
1 2.34 3.16 99.0 3.2
2 2.10 55.50 77.5 1.0
3 22.10 54.00 89.0 33.0
4 23.00 1.24 4.7 5.0
5 45.00 2.50 8.7 99.0
Or set values of df.index
In [5082]: df.index = range(1, 6)
In [5083]: df
Out[5083]:
A B C D
1 2.34 3.16 99.0 3.2
2 2.10 55.50 77.5 1.0
3 22.10 54.00 89.0 33.0
4 23.00 1.24 4.7 5.0
5 45.00 2.50 8.7 99.0
Details
Original df
In [5085]: df
Out[5085]:
A B C D
0 2.34 3.16 99.0 3.2
1 2.10 55.50 77.5 1.0
2 22.10 54.00 89.0 33.0
3 23.00 1.24 4.7 5.0
4 45.00 2.50 8.7 99.0

You need .values
df.index=df.index.values+1
df
Out[141]:
A B C D
1 2.34 3.16 99.0 3.2
2 2.10 55.50 77.5 1.0
3 22.10 54.00 89.0 33.0
4 23.00 1.24 4.7 5.0
5 45.00 2.50 8.7 99.0
As Per Zero :
df.index += 1

Related

How to replace values in a dataframes with values in another dataframe

I have 2 dataframes
df_1:
Week Day Coeff_1 ... Coeff_n
1 1 12 23
1 2 11 19
1 3 23 68
1 4 57 81
1 5 35 16
1 6 0 0
1 7 0 0
...
50 1 12 23
50 2 11 19
50 3 23 68
50 4 57 81
50 5 35 16
50 6 0 0
50 7 0 0
df_2:
Week Day Coeff_1 ... Coeff_n
1 1 0 0
1 2 0 0
1 3 0 0
1 4 0 0
1 5 0 0
1 6 56 24
1 7 20 10
...
50 1 0 0
50 2 0 0
50 3 0 0
50 4 0 0
50 5 0 0
50 6 10 84
50 7 29 10
In the first dataframe df_1 I have coefficients for monday to friday. In the second dataframes df_2 I have coefficients for the week end. My goal is to merge both dataframes such that I have no longer 0 values which are obsolete.
What is the best approach to do that?
I found that using df.replace seems to be a good approach
Assuming that your dataframes follow the same structure, you can capitalise on pandas functionality to align automatically on indexes. Thus you can replace 0's with np.nan in df1, and then use fillna:
df1.replace({0:np.nan},inplace=True)
df1.fillna(df2)
Week Day Coeff_1 Coeff_n
0 1.0 1.0 12.0 23.0
1 1.0 2.0 11.0 19.0
2 1.0 3.0 23.0 68.0
3 1.0 4.0 57.0 81.0
4 1.0 5.0 35.0 16.0
5 1.0 6.0 56.0 24.0
6 1.0 7.0 20.0 10.0
7 50.0 1.0 12.0 23.0
8 50.0 2.0 11.0 19.0
9 50.0 3.0 23.0 68.0
10 50.0 4.0 57.0 81.0
11 50.0 5.0 35.0 16.0
12 50.0 6.0 10.0 84.0
13 50.0 7.0 29.0 10.0
Can't you just append the rows df_1 where day is 1-5 to the rows of df_2 where day is 6-7?
df_3 = df_1[df_1.Day.isin(range(1,6))].append(df_2[df_2.Day.isin(range(6,8))])
To get a normal sorting, you can sort your values by week and day:
df_3.sort_values(['Week','Day'])

Based on some rules, how to expand data in Pandas?

Please forgive my English. I hope I can say clearly.
Assume we have this data:
>>> data = {'Span':[3,3.5], 'Low':[6.2,5.16], 'Medium':[4.93,4.1], 'High':[3.68,3.07], 'VeryHigh':[2.94,2.45], 'ExtraHigh':[2.48,2.06], '0.9':[4.9,3.61], '1.5':[3.23,2.38], '2':[2.51,1.85]}
>>> df = pd.DataFrame(data)
>>> df
Span Low Medium High VeryHigh ExtraHigh 0.9 1.5 2
0 3.0 6.20 4.93 3.68 2.94 2.48 4.90 3.23 2.51
1 3.5 5.16 4.10 3.07 2.45 2.06 3.61 2.38 1.85
I want to get this data:
Span Wind Snow MaxSpacing
0 3.0 Low 0.0 6.20
1 3.0 Medium 0.0 4.93
2 3.0 High 0.0 3.68
3 3.0 VeryHigh 0.0 2.94
4 3.0 ExtraHigh 0.0 2.48
5 3.0 0 0.9 4.90
6 3.0 0 1.5 3.23
7 3.0 0 2.0 2.51
8 3.5 Low 0.0 5.16
9 3.5 Medium 0.0 4.10
10 3.5 High 0.0 3.07
11 3.5 VeryHigh 0.0 2.45
12 3.5 ExtraHigh 0.0 2.06
13 3.5 0 0.9 3.61
14 3.5 0 1.5 2.38
15 3.5 0 2.0 1.85
The principles apply to df:
Span expands by the combination of Wind and Snow to get the MaxSpacing
Wind and Snow is mutually exclusive. When Wind is one of 'Low', 'Medium', 'High', 'VeryHigh', 'ExtraHigh', Snow is zero; when Snow is one of 0.9, 1.5, 2, Wind is zero.
Please help. Thank you.
Use DataFrame.melt for unpivot and then sorting by indices, create Snow column by to_numeric and Series.fillna in DataFrame.insert and last set 0 for Wind column:
df = (df.melt('Span', ignore_index=False, var_name='Wind', value_name='MaxSpacing')
.sort_index(ignore_index=True))
s = pd.to_numeric(df['Wind'], errors='coerce')
df.insert(2, 'Snow', s.fillna(0))
df.loc[s.notna(), 'Wind'] = 0
print (df)
Span Wind Snow MaxSpacing
0 3.0 Low 0.0 6.20
1 3.0 Medium 0.0 4.93
2 3.0 High 0.0 3.68
3 3.0 VeryHigh 0.0 2.94
4 3.0 ExtraHigh 0.0 2.48
5 3.0 0 0.9 4.90
6 3.0 0 1.5 3.23
7 3.0 0 2.0 2.51
8 3.5 Low 0.0 5.16
9 3.5 Medium 0.0 4.10
10 3.5 High 0.0 3.07
11 3.5 VeryHigh 0.0 2.45
12 3.5 ExtraHigh 0.0 2.06
13 3.5 0 0.9 3.61
14 3.5 0 1.5 2.38
15 3.5 0 2.0 1.85
Alternative solution with DataFrame.set_index and DataFrame.stack:
df = df.set_index('Span').rename_axis('Wind', axis=1).stack().reset_index(name='MaxSpacing')
s = pd.to_numeric(df['Wind'], errors='coerce')
df.insert(2, 'Snow', s.fillna(0))
df.loc[s.notna(), 'Wind'] = 0
print (df)
Span Wind Snow MaxSpacing
0 3.0 Low 0.0 6.20
1 3.0 Medium 0.0 4.93
2 3.0 High 0.0 3.68
3 3.0 VeryHigh 0.0 2.94
4 3.0 ExtraHigh 0.0 2.48
5 3.0 0 0.9 4.90
6 3.0 0 1.5 3.23
7 3.0 0 2.0 2.51
8 3.5 Low 0.0 5.16
9 3.5 Medium 0.0 4.10
10 3.5 High 0.0 3.07
11 3.5 VeryHigh 0.0 2.45
12 3.5 ExtraHigh 0.0 2.06
13 3.5 0 0.9 3.61
14 3.5 0 1.5 2.38
15 3.5 0 2.0 1.85

How to extract a database based on a condition in pandas?

Please help me
The below one is the problem...
write an expression to extract a new dataframe containing those days where the temperature reached at least 70 degrees, and assign that to the variable at_least_70. (You might need to think some about what the different columns in the full dataframe represent to decide how to extract the subset of interest.)
After that, write another expression that computes how many days reached at least 70 degrees, and assign that to the variable num_at_least_70.
This is the original DataFrame
Date Maximum Temperature Minimum Temperature \
0 2018-01-01 5 0
1 2018-01-02 13 1
2 2018-01-03 19 -2
3 2018-01-04 22 1
4 2018-01-05 18 -2
.. ... ... ...
360 2018-12-27 33 23
361 2018-12-28 40 21
362 2018-12-29 50 37
363 2018-12-30 37 24
364 2018-12-31 35 25
Average Temperature Precipitation Snowfall Snow Depth
0 2.5 0.04 1.0 3.0
1 7.0 0.03 0.6 4.0
2 8.5 0.00 0.0 4.0
3 11.5 0.00 0.0 3.0
4 8.0 0.09 1.2 4.0
.. ... ... ... ...
360 28.0 0.00 0.0 1.0
361 30.5 0.07 0.0 0.0
362 43.5 0.04 0.0 0.0
363 30.5 0.02 0.7 1.0
364 30.0 0.00 0.0 0.0
[365 rows x 7 columns]
I wrote the code for the above problem is`
at_least_70 = dfc.loc[dfc['Minimum Temperature']>=70,['Date']]
print(at_least_70)
num_at_least_70 = at_least_70.count()
print(num_at_least_70)
The Results it is showing
Date
204 2018-07-24
240 2018-08-29
245 2018-09-03
Date 3
dtype: int64
But when run the test case it is showing...
Incorrect!
You are not correctly extracting the subset.
As suggested by #HenryYik, remove the column selector:
at_least_70 = dfc.loc[dfc['Maximum Temperature'] >= 70,
['Date', 'Maximum Temperature']]
num_at_least_70 = len(at_least_70)
Use boolean indexing and for count Trues of mask use sum:
mask = dfc['Minimum Temperature'] >= 70
at_least_70 = dfs[mask]
num_at_least_70 = mask.sum()

Python Pandas Where Condition Is Not Working

I have created a where condition with python.
filter = data['Ueber'] > 2.3
data[filter]
Here you can see my dataset.
Saison Spieltag Heimteam ... Ueber Unter UeberUnter
0 1819 3 Bayern München ... 1.30 3.48 Ueber
1 1819 3 Werder Bremen ... 1.75 2.12 Unter
2 1819 3 SC Freiburg ... 2.20 1.69 Ueber
3 1819 3 VfL Wolfsburg ... 2.17 1.71 Ueber
4 1819 3 Fortuna Düsseldorf ... 1.46 2.71 Ueber
Unfortunately, my greater than condition is not working. What's the problem?
Thanks
Just for the sake of clarity, if you have really floats into your column, which you want into conditional check then it should work.
Example DataFrame:
>>> df = pd.DataFrame({'num': [-12.5, 60.0, 50.0, -25.10, 50.0, 51.0, 71.0]} , dtype=float)
>>> df
num
0 -12.5
1 60.0
2 50.0
3 -25.1
4 50.0
5 51.0
6 71.0
Conditional check to compare..
>>> df['num'] > 50.0
0 False
1 True
2 False
3 False
4 False
5 True
6 True
Name: num, dtype: bool
Result:
>>> df [ df['num'] > 50.0 ]
num
1 60.0
5 51.0
6 71.0

How can I replace a column of values by another with different number of rows?

I have the following df:
df=
A B C
2016-04-16 3 2 2
2016-04-17 4 4 1
2016-04-18 7 3 1
2016-04-19 5 1 3
2016-04-20 5 1 7
On the other hand I have a recalculated column (with 4 rows instead of 5):
df1=
C
2016-04-16 3.2
2016-04-17 4.7
2016-04-18 7.1
2016-04-19 3.3
how could I replace the recalculated column into the original df to obtain the following output?
df=
A B C
2016-04-16 3 2 3.2
2016-04-17 4 4 4.7
2016-04-18 7 3 7.1
2016-04-19 5 1 3.3
2016-04-20 5 1 7
Here's one way using combine_first
In [1327]: df.assign(C=df1.C.combine_first(df.C))
Out[1327]:
A B C
2016-04-16 3 2 3.2
2016-04-17 4 4 4.7
2016-04-18 7 3 7.1
2016-04-19 5 1 3.3
2016-04-20 5 1 7.0
Or,
In [1331]: df1.combine_first(df)
Out[1331]:
A B C
2016-04-16 3.0 2.0 3.2
2016-04-17 4.0 4.0 4.7
2016-04-18 7.0 3.0 7.1
2016-04-19 5.0 1.0 3.3
2016-04-20 5.0 1.0 7.0