Filteration on dataframe column value with combination of values - pandas

I have a dataframe which has 2 columns named TABLEID and STATID
There are different values in the both the columns.
when I filter the dataframe on values say '101PC' and 'ST101', it gives me 14K records and when I filter the dataframe on values say '102HT' and 'ST102', it gives me 14K records also. The issue is when I try to combine both the filters like below it gives me blank dataframe. I was expecting 28K records in my resultant dataframe. Any help is much appreciated
df[df[['TABLEID','STATID']].apply(tuple, axis = 1).isin([('101PC', 'ST101'), ('102HT','ST102')])]

Related

Merge two dataframe based on column which has splitted value

I have two data frames. One of the data frames appears to be as follows:
.
Products columns contain data like 1;3;5.
The other data frame looks like:
I am merging both of the frames:
Merge_Store_Transaction['products'] = Merge_Store_Transaction['products'].str.split(';')
Merge_Store_Transaction = Merge_Store_Transaction.explode('products')
Which give me result like: It duplicated all other values that I don't want. Is there a way where it divide the profit column with respective number of products and replicate the number or just fill other rows with zero.
I think that once you have this result, you can do something like the following:
Merge_Store_Transaction["profit"] = Merge_Store_Transaction.groupby(["group_id", "date"])["profit"].mean().reset_index(0, drop=True)
Same thing for the revenue_in_usd column.

How to filter on panda dataframe for missing values?

I have a data frame with mutliple columns and some of these have missing values.
I would to filter so that I can return a dataframe that has missing values on one or two specific columns.
Can anyone help me figure out how to do that?
Having a dataframe "df" with a columns "A"
df_missing = df[df['A'].isnull()]

iterate of df colomns to give rows with text

I have a dataframe with columns. The columns have mostly blank rows but a few of the rows have strings and those are the only rows i want to see. I have tried the below code but dont know how to only select strings in the columns and append to get a new dataframe with only columns with strings in the rows.
columns = list(df)
for i in columns:
df1 = df[df[i]== ]
can someone please help?
df[df['column_name'].isna()]
should do the trick

Convert Series to Dataframe where series index is Dataframe column names

I am selecting row by row as follows:
for i in range(num_rows):
row = df.iloc[i]
as a result I am getting a Series object where row.index.values contains names of df columns.
But I wanted instead dataframe with only one row having dataframe columns in place.
When I do row.to_frame() instead of 1x85 dataframe (1 row, 85 cols) I get 85x1 dataframe where index contains names of columns and row.columns
outputs
Int64Index([0], dtype='int64').
But all I want is just original data-frame columns with only one row. How do I do it?
Or how do I convert row.index values to row.column values and change 85x1 dimension to 1x85
You just need to adding T
row.to_frame().T
Also change your for loop with adding []
for i in range(num_rows):
row = df.iloc[[i]]

How to group by and sum several columns?

I have a big dataframe with several columns which contains strings, numbers, etc. I am trying to group by SCENARIO and then sum only the columns between 2020 and 2050. The only thing I have got so far is sum one column as displayed as follows, but I need to change this '2050' by the columns between 2020 and 2050, for instance.
df1 = df.groupby(["SCENARIO"])['2050'].sum().sum(axis=0)
You are creating a subset of the df with only that single column. I can't tell how your dataset looks like from the information provided, but try:
df.groupby(["SCENARIO"]).sum()
This should some up all the rows which are in the column.
Alternatively select the columns which you want to perform the summation on.
df.groupby(["SCENARIO"])[["column1","column2"]].sum()