Trying to mark first encounter in every group pandas

Trying to mark first encounter in every group pandas - pandas

I have a df with 5 columns:
What i'm trying to do is mark value of first customer interaction after talking to human in every specific group.
Hopefully the outcome would be like this:
What I have tried is shifting type column to put previous row in front of type to check if its customer and prev row is human. However, I can't figure out a grouping option to get min index for each group for each occurrence.

This works:
k = pd.DataFrame(df.groupby('group').apply(lambda g: (g['type'].eq('customer') & g['type'].shift(1).eq('human')).pipe(lambda x: [x.idxmax(), x[::-1].idxmax()])).tolist())
df['First'] = ''
df['Last'] = ''
df.loc[k[1], 'First'] = 'F'
df.loc[k[1], 'Last'] = 'L'
Output:
>>> df
group type First Last
0 x bot
1 x customer
2 x bot
3 x customer
4 x human
5 x customer F
6 x human
7 x customer L
8 y bot
9 y customer
10 y bot
11 y customer
12 y human
13 y customer F
14 y human
15 y customer L
16 z bot
17 z customer
18 z bot
19 z customer
20 z human
21 z customer F
22 z human
23 z customer L
24 z customer
25 z customer

Related

Pandas - find value in column based on values from another column and replace date in different column

I have a df that looks like below:
ID Name Supervisor SupervisorID
1 X Y
2 Y C
3 Z Y
4 C Y
5 V X
What I need is to find SupervisorID. I can find his ID by checking it in column Name and that I will see his ID so if supervisor is Y then I see that in column Name there is Y so his ID id 2. DF should looks like below:
ID Name Supervisor SupervisorID
1 X Y 2
2 Y C 4
3 Z Y 2
4 C Y 2
5 V X 1
Do you have any idea how to solve this?
Thanks for help and best regards

Use Series.map with DataFrame.drop_duplicates for unique Names, because in real data duplicates:
df['SupervisorID']=df['Supervisor'].map(df.drop_duplicates('Name').set_index('Name')['ID'])
print (df)
ID Name Supervisor SupervisorID
0 1 X Y 2
1 2 Y C 4
2 3 Z Y 2
3 4 C Y 2
4 5 V X 1

Conditional frequency of elements within lists in pandas data frame

I have a data frame in pandas like this:
STATUS FEATURES
A [x,y,z]
A [t, y]
B [x,p,t]
B [x,p]
I want to count the frequency of the elements in the lists of features conditional on the status.
The desired output would be:
STATUS FEATURES FREQUENCY
A x 1
A y 2
A z 1
A t 1
B x 2
B t 1
B p 2

Let us do explode , the groupby size
s=df.explode(['FEATURES']).groupby(['STATUS','FEATURES']).size().reset_index()

Use DataFrame.explode and SeriesGroupBy.value_counts:
new_df = (df.explode('FEATURES')
.groupby('STATUS')['FEATURES']
.value_counts()
.reset_index(name='FRECUENCY'))
print(new_df)
Output
STATUS FEATURES FRECUENCY
0 A y 2
1 A t 1
2 A x 1
3 A z 1
4 B p 2
5 B x 2
6 B t 1

difference between rows in a multi-level dataframe

I am looking to find difference between two rows in a multi-level data by iterating values within certain class and been trying different techniques by reading tutorials as I am still new to python/pandas power.
what I am trying to do is to find out difference between scores of teacher and each student in certain class.
dataframe:
Class, Name ,Reference, stats
X ,SHE ,student, 30
X ,GHE ,student, 20
X ,GMK ,student ,10
X ,JKO ,teacher ,50
Y ,HHH ,student ,20
Y ,KLP ,teacher ,30
Output:
Class,teacher, student, difference
X, JKO, SHE,20
X, JKO,GHE, 30
X, JKO, GMK, 40
Y, KLP, HHH, 10
Can anyone help me by guiding me towards the right direction? there can be more than 1 teachers in a class.
Thank you

Just split your dataset into two data frames, one for students one for teachers. Then merge.
students = df[df.Reference == 'student'][['Class','Name','stats']]
teachers = df[df.Reference == 'teacher'][['Class','Name','stats']]
new_df = students.merge(teachers, on='Class', suffixes=('_student','_teacher'))
new_df['difference'] = new_df.stats_teacher - new_df.stats_student
print(new_df)
Class Name_student stats_student Name_teacher stats_teacher difference
0 X SHE 30 JKO 50 20
1 X GHE 20 JKO 50 30
2 X GMK 10 JKO 50 40
3 Y HHH 20 KLP 30 10

Use:
print (df)
Class Name Reference stats
0 X SHE student 30
1 X GHE student 20
2 X GMK student 10
3 X JKO teacher 50
4 X ABC teacher 100 <-added one new row for general data
5 Y HHH student 20
6 Y KLP teacher 30
df = (df.query("Reference == 'teacher'")
.merge(df.query("Reference == 'student'"), on='Class', suffixes=('_t','_s'))
.assign(difference=lambda x: x['stats_t'] - x['stats_s'])
.drop(['Reference_s','Reference_t','stats_s','stats_t'], axis=1)
.rename(columns={'Name_s':'student','Name_t':'teacher'})
)
print (df)
Class teacher student difference
0 X JKO SHE 20
1 X JKO GHE 30
2 X JKO GMK 40
3 X ABC SHE 70
4 X ABC GHE 80
5 X ABC GMK 90
6 Y KLP HHH 10
Explanation:
Filter DataFrame by query with student and teacher rows
Then merge by column Class for all combinations in per groups
Then assign new column with subtract
Remove unnecessary columns by drop
Last rename columns

Below is the code with many for loops. So there should an optimized solution than this. (Later i will try to update this solution in a better way)
import pandas as pd
df = pd.read_csv("student.csv")
ref = df[df['Reference'] == 'teacher'].index.values.astype(int)
df['TeacherName'] = 'NA'
df['Difference'] = 0
for i in range(len(ref)):
if(i == 0):
for j in range(ref[i]+1):
df['TeacherName'][j] = df['Name'][ref[i]]
df['Difference'][j] = df['stats'][ref[i]] - df['stats'][j]
else:
for j in range(ref[i-1]+1, ref[i]):
df['TeacherName'][j] = df['Name'][ref[i]]
df['Difference'][j] = df['stats'][ref[i]] - df['stats'][j]
df[~ df.index.isin(ref)]
I'm getting the index of row for every occurrence of df['Reference'] == 'teacher' into a list named ref which will be dropped from df after loop statements.

How can I run a vlookup function in SQL within the same table?

I'm fairly new to SQL and struggling to find a good way to run the following query.
I have a table that looks something like this:
NAME JOB GRADE MANAGER NAME
X 7 O
Y 6 X
Z 5 X
A 4 Z
B 3 Z
C 2 Z
In this table, it shows that Y and Z report into X, and A, B and C report into Z.
I want to create a computed column showing the grade each person's most senior direct report or "n/a" if they don't manage anyone. So that would look something like this:
NAME JOB GRADE MANAGER NAME GRADE OF MOST SENIOR REPORT
X 7 O 6
Y 6 X N/A
Z 5 X 4
A 4 Z N/A
B 3 Z N/A
C 2 Z N/A
How would I do this?

SELECT g.*,isnull(convert(nvarchar, (SELECT max(g2.GRADE)
FROM dbo.Grade g2 WHERE
g2.manager =g.NAME AND g2.NAME!=g.NAME )),'N/A') as most_graded
FROM dbo.Grade g
The max will find out the topmost graded
Input
X 7 O
y 6 X
Z 5 X
A 6 Z
C 2 Z
Output
X 7 O 6
y 6 X N/A
Z 5 X 6
A 6 Z N/A
C 2 Z N/A

Something like this:
select name, job_grade, manager_name,
(select max(job_grade) from grades g2
where g2.manager_name = g1.name) as grade_of_most_recent_senior
from grades g1
order by name;
The above is ANSI SQL and should work on any DBMS.
SQLFiddle example: http://sqlfiddle.com/#!15/e0806/1

Setting a min. range before fetching data

i have a table which has localities numbered with unique numbers. Each locality has some buildings numbered that have the status as Activated = Y or N. i want to pick localities which have the min Building Activated = 'Y' count of 15.
Sample Data :
Locality ACTIVATED
1 Y
1 Y
1 N
1 N
1 N
1 N
2 Y
2 Y
2 Y
2 Y
2 Y
Eg : i need count of locality that with min. 5 Y in ACTIVATED Column

SELECT l.*
FROM Localities l
WHERE (SELECT COUNT(*) FROM Building b
WHERE b.LocalityNumber = l.LocalityNumber
AND b.Activated = 'Y') >= 15

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Trying to mark first encounter in every group pandas - pandas

Related

Pandas - find value in column based on values from another column and replace date in different column

Conditional frequency of elements within lists in pandas data frame

difference between rows in a multi-level dataframe

How can I run a vlookup function in SQL within the same table?

Setting a min. range before fetching data

Categories

Resources