Pandas search lower and upper neighbour within group - pandas

I have the following dataframe df, which specifies latitudes and longitudes for a certain groupnumber:
latitude longitude group
0 51.822231 4.700267 1
1 51.822617 4.801417 1
2 51.823235 4.903300 1
3 51.823433 5.003917 1
4 51.823616 5.504467 1
5 51.822231 3.900267 2
6 51.822617 3.901417 2
7 51.823235 3.903300 2
8 51.823433 6.903917 2
9 51.823616 8.904467 2
10 51.822231 1.900267 3
11 51.822617 2.901417 3
12 51.823235 11.903300 3
13 51.823433 12.903917 3
14 51.823616 13.904467 3
Within each groupnumber I try to find the lower and upper neighbour of the column 'longitude' for a specified value longitude_value = 5.00. All longitudes within each group 'trips' are sorted in df (they ascend in each group)
Per row I want to have the upper and lower neighbour values of longitude=5.000000. The desired output looks like:
latitude longitude trip
2 51.823235 4.903300 1
3 51.823433 5.003917 1
7 51.823235 3.903300 2
8 51.823433 6.903917 2
11 51.822617 2.901417 3
12 51.823235 11.903300 3
From this result I want to rearrange the data a little bit as:
lat_lo lat_up lon_lo lon_up
0 51.823235 51.823433 4.903300 5.003917
1 51.823235 51.823433 3.903300 6.903917
2 51.822617 51.823235 2.901417 11.903300

Hope I got your question right. See my attempt below. Made it long to be explicit in my approach. I could have easily introduced a longitude value of 5.00 and sliced on index but that would have complicated answering part 2 of your question. If I missed something, let me know.
Data
df=pd.read_clipboard()
df
Input point and calculate difference with longitude
fn=5.00
df['dif']=(df['longitude']-fn)
df
Find the minimum positive difference in each group
df1=df[df['dif'] > 0].groupby('group').min().reset_index().reindex()
Find the minimum negative difference in each group
df2=df[df['dif'] < 0].groupby('group').max().reset_index().reindex()
Append the second group above to the first into one df. This answers your question 1
df3=df1.append(df2, ignore_index=True).sort_values(['group','longitude'])
df3
Question 2
Introduce a column called status and append a pattern, 3 for the lower neighbor and 4 for the upper neighbor
df3['Status']=0
np.put(df3['Status'], np.arange(len(df3)), ['3','4'])
df3.drop(columns=['dif'], inplace=True)
df3
Rename the neighbours to lon_lo and lon_up
df3['Status']=np.where(df3['Status']==3,'lon_lo', (np.where(df3['Status']==4,'lon_up',df3['Status'] )))
Using pivot, break up the dataframe into lon_lo and latitude and do the same to lon_up. The rational here is to break up latitudes into two groups lo and up
first group break
df4=df3[df3['Status']=='lon_lo']
result=df4.pivot_table('longitude',['latitude','group'],'Status').reset_index().set_index('group')
second group break
df4=df3[df3['Status']=='lon_up']
result1=df4.pivot_table('longitude',['latitude','group'],'Status').reset_index().set_index('group')
Merge on index the two groups while renaming the latitudes to lo and up
final=result1.merge(result, left_index=True, right_index=True, suffixes=('_lo','_up'))
final
Output

Related

persistent column label in pandas dataframe

I have an issue where trying to work with pandas' indexing, this first happened on a larger set and i was able to recreate it in this dummy dataframe. Apologies if my table formatting is terrible, I don't know how to make it better visually.
Unnamed: 0 col1 col2 col3
0 Name Sun Mon Tue
1 one 1 2 1
2 two 4 4 3
3 three 2 1 1
4 four 1 5 5
5 five 1 5 5
6 six 5 1 1
7 seven 5 5 6
8 eight 5 3 4
9 nine 5 3 3
So what i am trying to do is to rename the 1st column label ('Unnamed: 0') to something meaningful, but then when i finally try to reset_index, the index "column" has the name "test" for some reason, while the first actual column gets the label "index".
df.rename({df.columns[0]: 'test'}, axis=1, inplace=True)
df.set_index('test', inplace=True)
dft = df.transpose()
dft
test Name one two three four five six seven eight nine
col1 Sun 1 4 2 1 1 5 5 5 5
col2 Mon 2 4 1 5 5 1 5 3 3
col3 Tue 1 3 1 5 5 1 6 4 3
First of all, if my understanding is correct, index is not even an actual column in the dataframe, why does it get to have a label when resetting index?
And more importantly, why are the labels "test" and "index" reversed?
dft.reset_index(inplace=True)
dft
test index Name one two three four five six seven eight nine
0 col1 Sun 1 4 2 1 1 5 5 5 5
1 col2 Mon 2 4 1 5 5 1 5 3 3
2 col3 Tue 1 3 1 5 5 1 6 4 3
I have tried every possible combination of set_index / reset_index i can think of, trying drop=True & inplace=True but i cannot find a way to create a proper index, like the one i started with.
Yes, the axis (index and column axis) can have names.
This is useful for multi-indexing.
When you call .reset_index, the index is extracted into a new column, which is named how your index was named (by default, 'index').
If you want, you can reset and rename index in one line:
df.rename_axis('Name').reset_index()
Why is 'test' printed not where I expect?
After your code, if you print(dft.columns), you will see:
Index(['index', 'Name', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine'],
dtype='object',
name='test')
There are 11 columns. The column axis is called 'test' (see name='test' in the output above).
Also: print(dft.columns.name) prints test.
So what you actually see when you print your dataframe are the column names, to the left of which is the name of the column axis: 'test'.
It is NOT how the index axis is named. You can check: print(type(dft.index.name)) prints <class 'NoneType'>.
Now, why is column axis named 'test'?
Let's see how it got there step by step.
df.rename({df.columns[0]: 'test'}, axis=1, inplace=True)
First column is now named 'test'.
df.set_index('test', inplace=True)
First column has moved from being a column to being an index. The index name is 'test'. The old index disappeared.
dft = df.transpose()
The column axis is now named 'test'. The index is now named however the column axis was named before transposing.

Multilevel Indexing with Groupby

Being new to python I'm struggling to apply other questions about the groupby function to my data. A sample of the data frame :
ID Condition Race Gender Income
1 1 White Male 1
2 2 Black Female 2
3 3 Black Male 5
4 4 White Female 3
...
I am trying to use the groupby function to gain a count of how many black/whites, male/females, and income (12 levels) there are in each of the four conditions. Each of the columns, including income, are strings (i.e., categorical).
I'd like to get something such as
Condition Race Gender Income Count
1 White Male 1 19
1 White Female 1 17
1 Black Male 1 22
1 Black Female 1 24
1 White Male 2 12
1 White Female 2 15
1 Black Male 2 17
1 Black Female 2 19
...
Everything I've tried has come back very wrong so I don't think I'm anywhere near right, but I"m been using variations of
Data.groupby(['Condition','Gender','Race','Income'])['ID'].count()
When I run the above line I just get a 2 column matrix with an indecipherable index (e.g., f2df9ecc...) and the second column is labeled ID with what appear to be count numbers. Any help is appreciated.
if you would investigate the resulting dataframe you would see that the columns are inside the index so just reset the index...
df = Data.groupby(['Condition','Gender','Race','Income'])['ID'].count().reset_index()
that was mainly to demonstrate but since you what you want you can sepcify the argument 'as_index' as following:
df = Data.groupby(['Condition','Gender','Race','Income'],as_index=False)['ID'].count()
also since you want the last column to be 'count' :
df = df.rename(columns={'ID':'count'})

Panda Move Same Index Names to Unique Column Names

I have a pandas df "lty" of shape 366,1 that looks like this below with the first "time" col is a month or Jan in this case, with the second "time" column is a day of the month. This is the result of
lty = ylt.groupby([ylt.index.month, ylt.index.day]).mean()
I need to reset the index possibly or get the columns to be renamed like this below so that "lty" has shape 366,3:
LT Mean
time time
1 1 11.263604
2 11.971495
3 11.989080
4 12.558736
5 11.850899
month day LT Mean
1 1 11.263604
2 11.971495
3 11.989080
4 12.558736
5 11.850899
I have tried to reset the index and i get this error:
lty.reset_index()
ValueError: cannot insert time, already exists
Thank you since I am still learning how to manipulate columns, indexing.
When you groupby, rename grouping Index attributes that way they don't collide after the aggregation:
lty = (ylt.groupby([ylt.index.month.rename('month'), ylt.index.day.rename('day')])
.mean().reset_index())
#month day LT Mean
#1 1 11.263604
#1 2 11.971495
#1 3 11.989080
#1 4 12.558736
#1 5 11.850899

iterrows() of 2 columns and save results in one column

in my data frame I want to iterrows() of two columns but want to save result in 1 column.for example df is
x y
5 10
30 445
70 32
expected output is
points sequence
5 1
10 2
30 1
445 2
I know about iterrows() but it saved out put in two different columns.How can I get expected output and is there any way to generate sequence number according to condition? any help will be appreciated.
First never use iterrows, because really slow.
If want 1, 2 sequence by number of columns convert values to numy array by DataFrame.to_numpy and add numpy.ravel, then for sequence use numpy.tile:
df = pd.DataFrame({'points': df.to_numpy().ravel(),
'sequence': np.tile([1,2], len(df))})
print (df)
points sequence
0 5 1
1 10 2
2 30 1
3 445 2
4 70 1
5 32 2
Do this way:
>>> pd.DataFrame([i[1] for i in df.iterrows()])
points sequence
0 5 1
1 10 2
2 30 1
3 445 2

Looking up values from one dataframe in specific row of another dataframe

I'm struggling with a bit of a complex (to me) lookup-type problem.
I have a dataframe df1 that looks like this:
Grade Value
0 A 16
1 B 12
2 C 5
And another dataframe (df2) where the values in one of the columns('Grade') from df1 forms the index:
Tier 1 Tier 2 Tier 3
A 20 17 10
B 16 11 3
C 7 6 2
I've been trying to write a bit of code that for each row in df1, look ups the row corresponding with 'Grade' in df2, finds the smallest value in df2 greater than 'Value', and returns the name of that column.
E.g. for the second row of df1, it looks up the row with index 'B' in df2: 16 is the smallest value greater than 12, so it returns 'Tier 1'. Ideal output would be:
Grade Value Tier
0 A 16 Tier 2
1 B 12 Tier 1
2 C 5 Tier 2
My novice, downloaded-Python-last-week attempt so far has been as follows, which is throwing up all manner of errors and doesn't even try returning the column name. Sorry also about the micro-ness of the question: any help appreciated!
for i, row in input_df1.iterrows():
Tier = np.argmin(df1['Value']<df2.loc[row,0:df2.shape[1]])
df2.loc[df1.Grade].eq(df1.Value, 0).idxmax(1)