Get column name in data frame if value is between values - pandas

I have a dataframe:
import numpy as np
import pandas as pd
random_number_gen = np.random.default_rng()
df = pd.DataFrame(random_number_gen.integers(-5, 5, size=(1, 13)), columns=list('ABCDEFGHIJKLM'))
A
B
C
D
E
F
G
H
I
J
K
L
M
0
1
4
-4
-1
3
-5
-3
0
-4
-1
3
2
I would like to obtain the names of the columns where a value falls between -1 and 1. I tried this and others:
df.columns[(( -1<= df.any()) & (df.any() <=1)).iloc[0]]
Any help is welcome. Thanks.

If you have a single row:
df.columns[df.iloc[0].between(-1,1)]
# or
df.columns[df.squeeze().between(-1,1)]
If you can have multiple rows:
df.columns[(df.ge(-1)&df.le(1)).any()]
Example output:
Index(['E', 'G', 'J'], dtype='object')
Used input:
A B C D E F G H I J K L M
0 3 -3 -4 -3 -1 3 -1 -5 -2 1 3 2 4

Related

regarding controlling the setup of index column [duplicate]

I have a dataframe which I want to plot with matplotlib, but the index column is the time and I cannot plot it.
This is the dataframe (df3):
but when I try the following:
plt.plot(df3['magnetic_mag mean'], df3['YYYY-MO-DD HH-MI-SS_SSS'], label='FDI')
I'm getting an error obviously:
KeyError: 'YYYY-MO-DD HH-MI-SS_SSS'
So what I want to do is to add a new extra column to my dataframe (named 'Time) which is just a copy of the index column.
How can I do it?
This is the entire code:
#Importing the csv file into df
df = pd.read_csv('university2.csv', sep=";", skiprows=1)
#Changing datetime
df['YYYY-MO-DD HH-MI-SS_SSS'] = pd.to_datetime(df['YYYY-MO-DD HH-MI-SS_SSS'],
format='%Y-%m-%d %H:%M:%S:%f')
#Set index from column
df = df.set_index('YYYY-MO-DD HH-MI-SS_SSS')
#Add Magnetic Magnitude Column
df['magnetic_mag'] = np.sqrt(df['MAGNETIC FIELD X (μT)']**2 + df['MAGNETIC FIELD Y (μT)']**2 + df['MAGNETIC FIELD Z (μT)']**2)
#Subtract Earth's Average Magnetic Field from 'magnetic_mag'
df['magnetic_mag'] = df['magnetic_mag'] - 30
#Copy interesting values
df2 = df[[ 'ATMOSPHERIC PRESSURE (hPa)',
'TEMPERATURE (C)', 'magnetic_mag']].copy()
#Hourly Average and Standard Deviation for interesting values
df3 = df2.resample('H').agg(['mean','std'])
df3.columns = [' '.join(col) for col in df3.columns]
df3.reset_index()
plt.plot(df3['magnetic_mag mean'], df3['YYYY-MO-DD HH-MI-SS_SSS'], label='FDI')
Thank you !!
I think you need reset_index:
df3 = df3.reset_index()
Possible solution, but I think inplace is not good practice, check this and this:
df3.reset_index(inplace=True)
But if you need new column, use:
df3['new'] = df3.index
I think you can read_csv better:
df = pd.read_csv('university2.csv',
sep=";",
skiprows=1,
index_col='YYYY-MO-DD HH-MI-SS_SSS',
parse_dates='YYYY-MO-DD HH-MI-SS_SSS') #if doesnt work, use pd.to_datetime
And then omit:
#Changing datetime
df['YYYY-MO-DD HH-MI-SS_SSS'] = pd.to_datetime(df['YYYY-MO-DD HH-MI-SS_SSS'],
format='%Y-%m-%d %H:%M:%S:%f')
#Set index from column
df = df.set_index('YYYY-MO-DD HH-MI-SS_SSS')
EDIT: If MultiIndex or Index is from groupby operation, possible solutions are:
df = pd.DataFrame({'A':list('aaaabbbb'),
'B':list('ccddeeff'),
'C':range(8),
'D':range(4,12)})
print (df)
A B C D
0 a c 0 4
1 a c 1 5
2 a d 2 6
3 a d 3 7
4 b e 4 8
5 b e 5 9
6 b f 6 10
7 b f 7 11
df1 = df.groupby(['A','B']).sum()
print (df1)
C D
A B
a c 1 9
d 5 13
b e 9 17
f 13 21
Add parameter as_index=False:
df2 = df.groupby(['A','B'], as_index=False).sum()
print (df2)
A B C D
0 a c 1 9
1 a d 5 13
2 b e 9 17
3 b f 13 21
Or add reset_index:
df2 = df.groupby(['A','B']).sum().reset_index()
print (df2)
A B C D
0 a c 1 9
1 a d 5 13
2 b e 9 17
3 b f 13 21
You can directly access in the index and get it plotted, following is an example:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
#Get index in horizontal axis
plt.plot(df.index, df[0])
plt.show()
#Get index in vertiacal axis
plt.plot(df[0], df.index)
plt.show()
You can also use eval to achieve this:
In [2]: df = pd.DataFrame({'num': range(5), 'date': pd.date_range('2022-06-30', '2022-07-04')}, index=list('ABCDE'))
In [3]: df
Out[3]:
num date
A 0 2022-06-30
B 1 2022-07-01
C 2 2022-07-02
D 3 2022-07-03
E 4 2022-07-04
In [4]: df.eval('index_copy = index')
Out[4]:
num date index_copy
A 0 2022-06-30 A
B 1 2022-07-01 B
C 2 2022-07-02 C
D 3 2022-07-03 D
E 4 2022-07-04 E

Append two pandas dataframe with different shapes and in for loop using python or pandasql

I have two dataframe such as:
df1:
id A B C D
1 a b c d
1 e f g h
1 i j k l
df2:
id A C D
2 x y z
2 u v w
The final outcome should be:
id A B C D
1 a b c d
1 e f g h
1 i j k l
2 x y z
2 u v w
These tables are generated using for loop from json files. So have to keep on appending these tables one below another.
Note: Two dataframes 'id' column is always different.
My approach:
data is a dataframe in which column 'X' has json data and has and "id" column also.
df1=pd.DataFrame()
for i, row1 in data.head(2).iterrows():
df2= pd.io.json.json_normalize(row1["X"])
df2.columns = df2.columns.map(lambda x: x.split(".")[-1])
df2["id"]=[row1["id"] for i in range(df2.shape[0])]
if len(df1)==0:
df1=df2.copy()
df1=pd.concat((df1,df2), ignore_index=True)
Error: AssertionError: Number of manager items must equal union of block items # manager items: 46, # tot_items: 49
How to solve this using python or pandas sql.
You can use pd.concat to concatenate two dataframes like
>>> pd.concat((df,df1), ignore_index=True)
id A B C D
0 1 a b c d
1 1 e f g h
2 1 i j k l
3 2 x NaN y z
4 2 u NaN v w

Adding new column to an existing dataframe at an arbitrary position [duplicate]

Can I insert a column at a specific column index in pandas?
import pandas as pd
df = pd.DataFrame({'l':['a','b','c','d'], 'v':[1,2,1,2]})
df['n'] = 0
This will put column n as the last column of df, but isn't there a way to tell df to put n at the beginning?
see docs: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.insert.html
using loc = 0 will insert at the beginning
df.insert(loc, column, value)
df = pd.DataFrame({'B': [1, 2, 3], 'C': [4, 5, 6]})
df
Out:
B C
0 1 4
1 2 5
2 3 6
idx = 0
new_col = [7, 8, 9] # can be a list, a Series, an array or a scalar
df.insert(loc=idx, column='A', value=new_col)
df
Out:
A B C
0 7 1 4
1 8 2 5
2 9 3 6
If you want a single value for all rows:
df.insert(0,'name_of_column','')
df['name_of_column'] = value
Edit:
You can also:
df.insert(0,'name_of_column',value)
df.insert(loc, column_name, value)
This will work if there is no other column with the same name. If a column, with your provided name already exists in the dataframe, it will raise a ValueError.
You can pass an optional parameter allow_duplicates with True value to create a new column with already existing column name.
Here is an example:
>>> df = pd.DataFrame({'b': [1, 2], 'c': [3,4]})
>>> df
b c
0 1 3
1 2 4
>>> df.insert(0, 'a', -1)
>>> df
a b c
0 -1 1 3
1 -1 2 4
>>> df.insert(0, 'a', -2)
Traceback (most recent call last):
File "", line 1, in
File "C:\Python39\lib\site-packages\pandas\core\frame.py", line 3760, in insert
self._mgr.insert(loc, column, value, allow_duplicates=allow_duplicates)
File "C:\Python39\lib\site-packages\pandas\core\internals\managers.py", line 1191, in insert
raise ValueError(f"cannot insert {item}, already exists")
ValueError: cannot insert a, already exists
>>> df.insert(0, 'a', -2, allow_duplicates = True)
>>> df
a a b c
0 -2 -1 1 3
1 -2 -1 2 4
You could try to extract columns as list, massage this as you want, and reindex your dataframe:
>>> cols = df.columns.tolist()
>>> cols = [cols[-1]]+cols[:-1] # or whatever change you need
>>> df.reindex(columns=cols)
n l v
0 0 a 1
1 0 b 2
2 0 c 1
3 0 d 2
EDIT: this can be done in one line ; however, this looks a bit ugly. Maybe some cleaner proposal may come...
>>> df.reindex(columns=['n']+df.columns[:-1].tolist())
n l v
0 0 a 1
1 0 b 2
2 0 c 1
3 0 d 2
Here is a very simple answer to this(only one line).
You can do that after you added the 'n' column into your df as follows.
import pandas as pd
df = pd.DataFrame({'l':['a','b','c','d'], 'v':[1,2,1,2]})
df['n'] = 0
df
l v n
0 a 1 0
1 b 2 0
2 c 1 0
3 d 2 0
# here you can add the below code and it should work.
df = df[list('nlv')]
df
n l v
0 0 a 1
1 0 b 2
2 0 c 1
3 0 d 2
However, if you have words in your columns names instead of letters. It should include two brackets around your column names.
import pandas as pd
df = pd.DataFrame({'Upper':['a','b','c','d'], 'Lower':[1,2,1,2]})
df['Net'] = 0
df['Mid'] = 2
df['Zsore'] = 2
df
Upper Lower Net Mid Zsore
0 a 1 0 2 2
1 b 2 0 2 2
2 c 1 0 2 2
3 d 2 0 2 2
# here you can add below line and it should work
df = df[list(('Mid','Upper', 'Lower', 'Net','Zsore'))]
df
Mid Upper Lower Net Zsore
0 2 a 1 0 2
1 2 b 2 0 2
2 2 c 1 0 2
3 2 d 2 0 2
A general 4-line routine
You can have the following 4-line routine whenever you want to create a new column and insert into a specific location loc.
df['new_column'] = ... #new column's definition
col = df.columns.tolist()
col.insert(loc, col.pop()) #loc is the column's index you want to insert into
df = df[col]
In your example, it is simple:
df['n'] = 0
col = df.columns.tolist()
col.insert(0, col.pop())
df = df[col]

Comparing and replacing column items pandas dataframe

I have three columns C1,C2,C3 in panda dataframe. My aim is to replace C1_i by C2_j whenever C3_i=C1_j. These are all strings. I was trying where but failed. What is a good way to do this avoiding for loop?
If my data frame is
df=pd.DataFrame({'c1': ['a', 'b', 'c'], 'c2': ['d','e','f'], 'c3': ['c', 'z', 'b']})
Then I want c3 to be replaced by ['f','z','e']
I tried this, which takes very long time.
for i in range(0,len(df)):
for j in range(0,len(df)):
if (df.iloc[i]['c1']==df.iloc[j]['c3']):
df.iloc[j]['c3']=accounts.iloc[i]['c2']
Use map by Series created by set_index:
df['c3'] = df['c3'].map(df.set_index('c1')['c2']).fillna(df['c3'])
Alternative solution with update:
df['c3'].update(df['c3'].map(df.set_index('c1')['c2']))
print (df)
c1 c2 c3
0 a d f
1 b e z
2 c f e
Example data:
dataframe = pd.DataFrame({'a':['10','4','3','40','5'], 'b':['5','4','3','2','1'], 'c':['s','d','f','g','h']})
Output:
a b c
0 10 5 s
1 4 4 d
2 3 3 f
3 40 2 g
4 5 1 h
Code:
def replace(df):
if len(dataframe[dataframe.b==df.a]) != 0:
df['a'] = dataframe[dataframe.b==df.a].c.values[0]
return df
dataframe = dataframe.apply(replace, 1)
Output:
a b c
0 1 5 0
1 2 4 0
2 0 3 0
3 4 2 0
4 5 1 0
Is it what you want?

Creating new columns with value names of other columns in pandas

I have a DataFrame as shown below.
DF =
id w R
1 A L
2 B J
3 C L,J
I now want to create a new column that shows if the value in the column Rappears in the row.
DF2 =
id w R L J
1 A L 1 0
2 B J 0 1
3 C L,J 1 1
I tried this line, but the result wasn't what I wanted:
for x in DF.R.unique():
DF[x]=(DF.R==x).astype(int)
DF2 =
id w R L J L,J
1 A L 1 0 0
2 B J 0 1 0
3 C L,J 0 0 1
What is needed to fix this? The DF is also very big and slow methods won't work.
You need to specific the sep , in your example is ,
df.R.str.get_dummies(sep=',')
Out[192]:
J L
0 0 1
1 1 0
2 1 1
I would use pandas' built-in str methods:
chars_to_count = ['L', 'J']
for char in chars_to_count:
DF[char] = DF['R'].str.count(char)