Panda Move Same Index Names to Unique Column Names - pandas

I have a pandas df "lty" of shape 366,1 that looks like this below with the first "time" col is a month or Jan in this case, with the second "time" column is a day of the month. This is the result of
lty = ylt.groupby([ylt.index.month, ylt.index.day]).mean()
I need to reset the index possibly or get the columns to be renamed like this below so that "lty" has shape 366,3:
LT Mean
time time
1 1 11.263604
2 11.971495
3 11.989080
4 12.558736
5 11.850899
month day LT Mean
1 1 11.263604
2 11.971495
3 11.989080
4 12.558736
5 11.850899
I have tried to reset the index and i get this error:
lty.reset_index()
ValueError: cannot insert time, already exists
Thank you since I am still learning how to manipulate columns, indexing.

When you groupby, rename grouping Index attributes that way they don't collide after the aggregation:
lty = (ylt.groupby([ylt.index.month.rename('month'), ylt.index.day.rename('day')])
.mean().reset_index())
#month day LT Mean
#1 1 11.263604
#1 2 11.971495
#1 3 11.989080
#1 4 12.558736
#1 5 11.850899

Related

persistent column label in pandas dataframe

I have an issue where trying to work with pandas' indexing, this first happened on a larger set and i was able to recreate it in this dummy dataframe. Apologies if my table formatting is terrible, I don't know how to make it better visually.
Unnamed: 0 col1 col2 col3
0 Name Sun Mon Tue
1 one 1 2 1
2 two 4 4 3
3 three 2 1 1
4 four 1 5 5
5 five 1 5 5
6 six 5 1 1
7 seven 5 5 6
8 eight 5 3 4
9 nine 5 3 3
So what i am trying to do is to rename the 1st column label ('Unnamed: 0') to something meaningful, but then when i finally try to reset_index, the index "column" has the name "test" for some reason, while the first actual column gets the label "index".
df.rename({df.columns[0]: 'test'}, axis=1, inplace=True)
df.set_index('test', inplace=True)
dft = df.transpose()
dft
test Name one two three four five six seven eight nine
col1 Sun 1 4 2 1 1 5 5 5 5
col2 Mon 2 4 1 5 5 1 5 3 3
col3 Tue 1 3 1 5 5 1 6 4 3
First of all, if my understanding is correct, index is not even an actual column in the dataframe, why does it get to have a label when resetting index?
And more importantly, why are the labels "test" and "index" reversed?
dft.reset_index(inplace=True)
dft
test index Name one two three four five six seven eight nine
0 col1 Sun 1 4 2 1 1 5 5 5 5
1 col2 Mon 2 4 1 5 5 1 5 3 3
2 col3 Tue 1 3 1 5 5 1 6 4 3
I have tried every possible combination of set_index / reset_index i can think of, trying drop=True & inplace=True but i cannot find a way to create a proper index, like the one i started with.
Yes, the axis (index and column axis) can have names.
This is useful for multi-indexing.
When you call .reset_index, the index is extracted into a new column, which is named how your index was named (by default, 'index').
If you want, you can reset and rename index in one line:
df.rename_axis('Name').reset_index()
Why is 'test' printed not where I expect?
After your code, if you print(dft.columns), you will see:
Index(['index', 'Name', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine'],
dtype='object',
name='test')
There are 11 columns. The column axis is called 'test' (see name='test' in the output above).
Also: print(dft.columns.name) prints test.
So what you actually see when you print your dataframe are the column names, to the left of which is the name of the column axis: 'test'.
It is NOT how the index axis is named. You can check: print(type(dft.index.name)) prints <class 'NoneType'>.
Now, why is column axis named 'test'?
Let's see how it got there step by step.
df.rename({df.columns[0]: 'test'}, axis=1, inplace=True)
First column is now named 'test'.
df.set_index('test', inplace=True)
First column has moved from being a column to being an index. The index name is 'test'. The old index disappeared.
dft = df.transpose()
The column axis is now named 'test'. The index is now named however the column axis was named before transposing.

How to keep only the last index in groups of rows where a condition is met in pandas?

I have the following dataframe:
d = {'value': [1,1,1,1,1,1,1,1,1,1], 'flag_1': [0,1,0,1,1,1,0,1,1,1],'flag_2':[1,0,1,1,1,1,1,0,1,1],'index':[1,2,3,4,5,6,7,8,9,10]}
df = pd.DataFrame(data=d)
I need to perform the following filter on it:
If flag 1 and flag 2 are equal keep the row with the maximum index from the consecutive indices. Below for rows 4,5,6 and rows 9,10 flag 1 and flag 2 are equal. From the group of consecutive indices 4,5,6 therefore I wish to keep only row 6 and drop rows 4 and 5. For the next group of rows 9 and 10 I wish to keep only row 10. The rows where flag 1 and 2 are not equal should all be retained. I want my final output to look as shown below:
I am really not sure how to achieve what is required so I would be grateful for any advice on how to do it.
IIUC, you can compare consecutive rows with shift. This solution requires a sorted index.
In [5]: df[~df[['flag_1', 'flag_2']].eq(df[['flag_1', 'flag_2']].shift(-1)).all(axis=1)]
Out[5]:
value flag_1 flag_2 index
0 1 0 1 1
1 1 1 0 2
2 1 0 1 3
5 1 1 1 6
6 1 0 1 7
7 1 1 0 8
9 1 1 1 10

How to split pandas dataframe into multiple dataframes (holding together rows) based upon a column's value

My problem is similar to split a dataframe into chunks of N rows problem, expect that the number of rows in each chunk will be different. I have a datafame as such:
A
B
C
1
2
0
1
2
1
1
2
2
1
2
0
1
2
1
1
2
2
1
2
3
1
2
4
1
2
0
A and B are just whatever don't pay attention. Column C though starts at 0 and increments with each row until it suddenly resets to 0. So in the dataframe included the first 3 rows are a new dataframe, then the next 5 are a second new dataframe, and this continues as my dataframe adds more and more rows.
To finish off the question,
df = [x for _, x in df.groupby(df['C'].eq(0).cumsum())]
allows me to group all the subgroups and then with this groupby I can select each subgroups as a separate dataframe.

Pandas search lower and upper neighbour within group

I have the following dataframe df, which specifies latitudes and longitudes for a certain groupnumber:
latitude longitude group
0 51.822231 4.700267 1
1 51.822617 4.801417 1
2 51.823235 4.903300 1
3 51.823433 5.003917 1
4 51.823616 5.504467 1
5 51.822231 3.900267 2
6 51.822617 3.901417 2
7 51.823235 3.903300 2
8 51.823433 6.903917 2
9 51.823616 8.904467 2
10 51.822231 1.900267 3
11 51.822617 2.901417 3
12 51.823235 11.903300 3
13 51.823433 12.903917 3
14 51.823616 13.904467 3
Within each groupnumber I try to find the lower and upper neighbour of the column 'longitude' for a specified value longitude_value = 5.00. All longitudes within each group 'trips' are sorted in df (they ascend in each group)
Per row I want to have the upper and lower neighbour values of longitude=5.000000. The desired output looks like:
latitude longitude trip
2 51.823235 4.903300 1
3 51.823433 5.003917 1
7 51.823235 3.903300 2
8 51.823433 6.903917 2
11 51.822617 2.901417 3
12 51.823235 11.903300 3
From this result I want to rearrange the data a little bit as:
lat_lo lat_up lon_lo lon_up
0 51.823235 51.823433 4.903300 5.003917
1 51.823235 51.823433 3.903300 6.903917
2 51.822617 51.823235 2.901417 11.903300
Hope I got your question right. See my attempt below. Made it long to be explicit in my approach. I could have easily introduced a longitude value of 5.00 and sliced on index but that would have complicated answering part 2 of your question. If I missed something, let me know.
Data
df=pd.read_clipboard()
df
Input point and calculate difference with longitude
fn=5.00
df['dif']=(df['longitude']-fn)
df
Find the minimum positive difference in each group
df1=df[df['dif'] > 0].groupby('group').min().reset_index().reindex()
Find the minimum negative difference in each group
df2=df[df['dif'] < 0].groupby('group').max().reset_index().reindex()
Append the second group above to the first into one df. This answers your question 1
df3=df1.append(df2, ignore_index=True).sort_values(['group','longitude'])
df3
Question 2
Introduce a column called status and append a pattern, 3 for the lower neighbor and 4 for the upper neighbor
df3['Status']=0
np.put(df3['Status'], np.arange(len(df3)), ['3','4'])
df3.drop(columns=['dif'], inplace=True)
df3
Rename the neighbours to lon_lo and lon_up
df3['Status']=np.where(df3['Status']==3,'lon_lo', (np.where(df3['Status']==4,'lon_up',df3['Status'] )))
Using pivot, break up the dataframe into lon_lo and latitude and do the same to lon_up. The rational here is to break up latitudes into two groups lo and up
first group break
df4=df3[df3['Status']=='lon_lo']
result=df4.pivot_table('longitude',['latitude','group'],'Status').reset_index().set_index('group')
second group break
df4=df3[df3['Status']=='lon_up']
result1=df4.pivot_table('longitude',['latitude','group'],'Status').reset_index().set_index('group')
Merge on index the two groups while renaming the latitudes to lo and up
final=result1.merge(result, left_index=True, right_index=True, suffixes=('_lo','_up'))
final
Output

Remove All Pandas Rows Except Each Column Closest The The Start of Hour

I have a frame, df:
Date A B C
x 1 1 1
y 1 1 1
z 1 1 1
The "Date" column is my index, and all timestamps are random times down to the second level. I want to remove all rows in the dataframe, except for the row that is the closest to the start of a new hour.
For example, if 12/15/16 15:16:12 is the earliest row in hour 15 of that date, I want every other row with a time stamp greater than that stamp to be deleted. I then want the process repeated for the next hour, and so on.
Is this possible in a fast manner in pandas?
Thanks
You can using groupby and head after sort_index
df.sort_index().groupby(df.index.strftime('%Y-%m-%d %H')).head(1)
Out[83]:
A
Date
2016-12-15 15:16:12 1
Data input
df
Out[84]:
A
Date
2016-12-15 15:16:12 1
2016-12-15 15:19:12 1
2016-12-15 15:56:12 1