code that works gets 0 key error when iterate it - pandas

When I run below code, it works
df[df['column1'].isin([data['column2'][0]])]['column3'][0]
But when I iterate it as below, it gives key error 0
newlist2=[]
for i in datalist:
newlist2.append(df[df['column1'].isin([globals()[i]['column2'][0]])]['column3'][0])
Error:
KeyError Traceback (most recent call last)
Input In [67], in <cell line: 4>()
3 newlist2=[]
4 for i in datalist:
----> 5 newlist2.append(mergeddata[mergeddata['DATE_OF_RESTRUCTURE'].isin([**globals()[i]['REPORT_DATE'][0]**])]['CONTRACT_NUMBER'][0])

Let's split following code into four parts
df[df['column1'].isin([globals()[i]['column2'][0]])]['column3'][0]
Condition check: Series.isin().
Boolean indexing: df[boolean]
Column selection: sub_df['column3']
Index selection: column3[0]
When the Series.isin() returns all False and you do boolean indexing with it. This will cause the sub_df empty so your [0] indexing will through indexing error.

Related

AttributeError: 'Index' object has no attribute 'mean'

I am encountering an error that I can't get out of:
I have separated my columns according to the unique values of the variables:
cats_df = df.columns[df.nunique() < 6]
num_df = df.columns[df.nunique()>= 6]
And I wanted to replace the missing values of the numerical columns >= 6 with the average:
num_df = num_df.fillna(num_df.mean())
But I get this error message :
AttributeError Traceback (most recent call last)
<ipython-input-22-59bfd4048c41> in <module> ----> 1 num_df = num_df.fillna(num_df.mean()) 2 num_df
AttributeError: 'Index' object has no attribute 'mean'
Can you help me solve this problem?
The problem is that num_df is an index, not a dataframe, you may need something like this:
num_df = df[df.columns[df.nunique()>= 6]]
num_df = num_df.fillna(num_df.mean())

Getting the mean of a column using Groupby

I am attempting to get the mean of the columns in a df but keeping getting this error using groupby
grouped_df = df.groupby('location')['number of orders'].mean()
print(grouped_df)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-5-8cc491c4c100> in <module>
----> 1 grouped_df = df.groupby('location')['number of orders'].mean()
2 print(grouped_df)
NameError: name 'df' is not defined
If I understand your comment correctly, your DataFrame is calleed df_dig. Accordingly
grouped_df = df_dig.groupby('location')['number of orders'].mean()

Substituting variables when using Dataframes

I am trying to iterate to_datetime formating across multiple columns and create a new column with a prefix. The issue I seem to be having is substituting the Column Header in the to_datetime command. Manually the command below works:-
pipeline['pyCreated_Date'] = pd.to_datetime(pipeline.Created_Date, errors='raise')
But I get a Attribute Error: 'DataFrame' object has no attribute 'dh' when I try to iterate. I have searched for answers and tried various attempts based on Renaming pandas data frame columns using a for loop but I appear to be missing so fundemental information. Here's my most recent code:-
date_header = ['Created_Date', 'End_Date', 'Expected_Book_Date', 'Last_Modified_Date',
'Start_Date', 'Workspace_Won/Lost_Date', 'pyCreated_Date']
for dh in date_header:
pipeline['py' + dh.format()] = pd.to_datetime(
pipeline.dh.format(), errors='raise')
It appears dh is not being recognised as the Error reads:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-121-d00bf0a5a7fd> in <module>()
3 date_header = ['Created_Date', 'End_Date', 'Expected_Book_Date', 'Last_Modified_Date', 'Start_Date', 'Workspace_Won/Lost_Date']
4 for dh in date_header:
----> 5 pipeline['py' + dh.format()] = pd.to_datetime(pipeline.dh.format(), errors='raise')
/usr/local/lib/python3.6/site-packages/pandas/core/generic.py in __getattr__(self, name)
4370 if self._info_axis._can_hold_identifiers_and_holds_name(name):
4371 return self[name]
-> 4372 return object.__getattribute__(self, name)
4373
4374 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'dh'
What is the correct syntax to achieve this please? Apologies if it's a rookie mistake but I appreciate your support.
Many thanks
UPDATED after ALollz kind help!
Here's what finally worked
for col_name in date_header:
pipeline['py'+ col_name.format()] = pd.to_datetime(pipeline[col_name], errors='coerce')
print(f"{pipeline['py'+ col_name.format()].value_counts(dropna=False)}")

Tensorflow tf.split() list index out of range?

Here's the codes:
a = tf.constant([1,2,3,4])
b = tf.constant([4])
c = tf.split(a, tf.squeeze(b))
then, it turns out to be wrong:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jeff/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1203, in split
num = size_splits_shape.dims[0]
IndexError: list index out of range
But why?
The docs state,
If num_or_size_splits is a tensor, size_splits, then splits value into len(size_splits) pieces. The shape of the i-th piece has the same size as the value except along dimension axis where the size is size_splits[i].
Note that size_splits needs to be slicable.
However when you squeeze(b), because it has only one element in your example, it returns a scalar that has no dimension. A scalar cannot be sliced :
b_ = tf.squeeze(b)
b_[0] # error
Hence your error.

to_dataframe() bug when query returns no results

If a valid BigQuery query returns 0 rows, to_dataframe() crashes. (btw, I am running this on Google Cloud Datalab)
for example:
q = bq.Query('SELECT * FROM [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls] WHERE ( Protein_Change="V600E" ) LIMIT 10')
r = q.results()
r.to_dataframe()
produces:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-17-de55245104c0> in <module>()
----> 1 r.to_dataframe()
/usr/local/lib/python2.7/dist-packages/gcp/bigquery/_table.pyc in to_dataframe(self, start_row, max_rows)
628 # Need to reorder the dataframe to preserve column ordering
629 ordered_fields = [field.name for field in self.schema]
--> 630 return df[ordered_fields]
631
632 def to_file(self, destination, format='csv', csv_delimiter=',', csv_header=True):
TypeError: 'NoneType' object has no attribute '__getitem__'
is this a known bug?
Certainly not a known bug. Please do log a bug as mentioned by Felipe.
Contributions, both bug reports, and of course fixes, are welcome! :)