What is the difference between slicing the rows and fetching the column, versus fetching the column and slicing the rows? - pandas

What is the difference between the following approaches to get the rows from a column of df, a pandas DataFrame :
df['col_name'][x:y]
vs.
df.loc[x:y,'col_name']

Related

Find the average of a column based on another Pandas?

I'm working on a jupyter notebook, and I would like to get the average 'pcnt_change' based on 'day_of_week'. How do I do this?
A simple groupby call would do the trick here.
If df is the pandas dataframe:
df.groupby('day_of_week').mean()
would return a dataframe with average of all numeric columns in the dataframe with day_of_week as index. If you want only certain column(s) to be returned, select only the needed columns on the groupby call (for e.g.,
df[['open_price', 'high_price', 'day_of_week']].groupby('day_of_week').mean()

Filteration on dataframe column value with combination of values

I have a dataframe which has 2 columns named TABLEID and STATID
There are different values in the both the columns.
when I filter the dataframe on values say '101PC' and 'ST101', it gives me 14K records and when I filter the dataframe on values say '102HT' and 'ST102', it gives me 14K records also. The issue is when I try to combine both the filters like below it gives me blank dataframe. I was expecting 28K records in my resultant dataframe. Any help is much appreciated
df[df[['TABLEID','STATID']].apply(tuple, axis = 1).isin([('101PC', 'ST101'), ('102HT','ST102')])]

How to convert pandas dataframe to single index after aggregation?

I have been playing with aggregation in pandas dataframe. Considering the following dataframe:
df=pd.DataFrame({'a':[1,2,3,4,5,6,7,8],
'batch':['q','q','q','w','w','w','w','e'],
'c':[4,1,3,4,5,1,3,2]})
I have to do aggregation on the batch column with mean for column a and min for column c.
I used the following method to do the aggregation:
agg_dict = {'a':{'a':'mean'},'c':{'c':'min'}}
aggregated_df = df.groupby("batch").agg(agg_dict)
The problem is that I want the final data frame to have the same columns as the original data frame with the slight difference of having the aggregated values present in each of the columns.
The result of the above aggregation is a multi-index data frame, and am not sure how to convert it to an individual data frame?
I followed the link: Reverting from multiindex to single index dataframe in pandas . But, this didn't work, and the final output was still a multi-index data frame.
Great, if someone could help
you can try the following code df.groupby('batch').aggregate({'c':'min','a':mean})

Delete rows in a dataframe by a range of dates

I have a dataframe with a column 'date' (YYYY-MM-DD HH:MM:SS) and datetime64 type.
I want to drop/eliminate rows by selecting ranges of dates. How can I do this on python/pandas?
Thank you so much in advance
(I cannot post comments, thus I dare to put an answer) The following questions also refer to deleting or filtering a data frame based on the value of a given column:
Delete rows from a pandas DataFrame based on a conditional expression involving len(string) giving KeyError
Deleting DataFrame row in Pandas based on column value
Basically, you can pass a boolean array to the index operator [ ] of the data frame, this returns the filtered data frame. Here the pandas v1.0.1 (!) documentation of how to index data frames. Also this question is helpful.

pandas : Indexing for thousands of rows in dataframe

I initially had 100k rows in my dataset. I read the csv using pandas into a dataframe called data. I tried to do a subset selection of 51 rows using .loc. My index labels are numeric values 0, 1, 2, 3 etc. I tried using this command -
data = data.loc['0':'50']
But the results were weird, it took all the rows from 0 to 49999, looks like it is taking rows till the index value starts with 50.
Similarly, I tried with this command - new_data = data.loc['0':'19']
and the result was all the rows, starting from 0 till 18999.
Could this be a bug in pandas?
You want to use .iloc in place of .loc, since you are selecting data from the dataframe via numeric indices.
For example:
data.iloc[:50,:]
Keep in mind that your indices are of numeric-type, not string-type, so querying with a string (as you have done in your OP) attempts to match string-wise comparisons.