this seemingly simple operation at point d) still eludes me after numerous attempts to do it by myself.
The for loop I use:
a) cycles through an unknown numbers of excel files,
b) selects 3 columns from each file,
c) perform some string manipulations on their headers using
conditions then
d) outputs the 1-row extraction of the headers I
have achieved so far to a individual list .
After n(3) iterations of a), b) and c), the for loop outputs lists such as:
['Col1','Col1a','Col1b']
['Col2','Col2a','Col2b']
['Col3','Col3a','Col3b']
I am looking to append/concatenate/merge these individual lists each in its own individual row into one dataframe that I can further manipulate.
Excepted final dataframe with index=True and header=None:
0, 'Col1','Col1a','Col1b'
1, 'Col2','Col2a','Col2b'
2, 'Col3','Col3a','Col3b'
I have tried many examples found in SO such as:
df = pd.DataFrame()
for lst in [list1, list2, list3]:
df_temp = pd.DataFrame(lst)
df = df.append(df_temp)
print(df)
Thanks for the time you take reviewing this request.
You can create the dataframe from the list:
pd.DataFrame([list1, list2, list3]).to_csv('file.csv', header=None, index=True)
file.csv:
0,Col1,Col1a,Col1b
1,Col2,Col2a,Col2b
2,Col3,Col3a,Col3b
I have a dataframe with a lot of columns in it. Now I want to select only certain columns. I have saved all the names of the columns that I want to select into a Python list and now I want to filter my dataframe according to this list.
I've been trying to do:
df_new = df[[list]]
where list includes all the column names that I want to select.
However I get the error:
TypeError: unhashable type: 'list'
Any help on this one?
You can remove one []:
df_new = df[list]
Also better is use other name as list, e.g. L:
df_new = df[L]
It look like working, I try only simplify it:
L = []
for x in df.columns:
if not "_" in x[-3:]:
L.append(x)
print (L)
List comprehension:
print ([x for x in df.columns if not "_" in x[-3:]])
I have a dictionary that I can use the get method to extract values from but I need to subset these values. For example
dict_of_measures = {k: v for k, v in measures.groupby('Measure')}
And I am using get
BCS=dict_of_measures.get('BCS')
I have several values and wanted to know if I could use a for loop to extract from the dictionary and subset into multiple dataframes per measure using the get method? Is this possible?
for measure name in dict_of_measures:
get measure name()
you can use dict comprehension-
result= []
keys_to_extract = ['key1','key2']
new_dict = {k: bigdict[k] for k in keys_to_extract}
result.append(new_dict) # add dictionary to list. This can then be converted into pandas dataframe
I am wondering if anyone can help. I have a number of dataframes stored in a dictionary. I simply want to access each of these dataframes and count the values in a column in the column I have 10 letters. In the first dataframe there are 5bs and 5 as. For example the output from the count I would expect to be is a = 5 and b =5. However for each dataframe this count would be different hence I would like to store the output of these counts either into another dictionary or a separate variable.
The dictionary is called Dict and the column name in all the dataframes is called letters. I have tried to do this by accessing the keys in the dictionary but can not get it to work. A section of what I have tried is shown below.
import pandas as pd
for key in Dict:
Count=pd.value_counts(key['letters'])
Count here would ideally change with each new count output to store into a new variable
A simplified example (the actual dataframe sizes are max 5000,63) of the one of the 14 dataframes in the dictionary would be
`d = {'col1': [1, 2,3,4,5,6,7,8,9,10], 'letters': ['a','a','a','b','b','a','b','a','b','b']}
df = pd.DataFrame(data=d)`
The other dataframes are names df2,df3,df4 etc
I hope that makes sense. Any help would be much appreciated.
Thanks
If you want to access both key and values when iterating over a dictionary, you should use the items function.
You could use another dictionary to store the results:
letter_counts = {}
for key, value in Dict.items():
letter_counts[key] = value["letters"].value_counts()
You could also use dictionary comprehension to do this in 1 line:
letter_counts = {key: value["letters"].value_counts() for key, value in Dict.items()}
The easiest thing is probably dictionary comprehension:
d = {'col1': [1, 2,3,4,5,6,7,8,9,10], 'letters': ['a','a','a','b','b','a','b','a','b','b']}
d2 = {'col1': [1, 2,3,4,5,6,7,8,9,10,11], 'letters': ['a','a','a','b','b','a','b','a','b','b','a']}
df = pd.DataFrame(data=d)
df2 = pd.DataFrame(d2)
df_dict = {'d': df, 'd2': df2}
new_dict = {k: v['letters'].count() for k,v in df_dict.items()}
# out
{'d': 10, 'd2': 11}
I am trying to:
read files and store in dataset
store names of the dataframes in a dataframe
loop to recover the dataframe, and pass it to a function as a dataframe
It does't work because when I retrieve the name of the dataframe, it is a str object, not a dataframe, so the calculus fails.
df_files:
dataframe name
0 df_bureau bureau
1 df_previous_application previous_application
Code:
def missing_values_table_for(df_for, name):
mis_val_for = df_for.isnull().sum() # count null values
-> error
for index, row in df_files.iterrows():
missing_values_for = missing_values_table_for(dataframe, name)
Thanks in advance.
I believe the best here is working with dictionary of Dataframes creating by loop names of files by glob:
import glob
files = glob.glob('files/*.csv')
dfs = {f: pd.read_csv(f) for f in files}
for k, v in dfs.items():
df = v.isnull().sum()
print (df)