I have a df that looks like below:
ID Name Supervisor SupervisorID
1 X Y
2 Y C
3 Z Y
4 C Y
5 V X
What I need is to find SupervisorID. I can find his ID by checking it in column Name and that I will see his ID so if supervisor is Y then I see that in column Name there is Y so his ID id 2. DF should looks like below:
ID Name Supervisor SupervisorID
1 X Y 2
2 Y C 4
3 Z Y 2
4 C Y 2
5 V X 1
Do you have any idea how to solve this?
Thanks for help and best regards
Use Series.map with DataFrame.drop_duplicates for unique Names, because in real data duplicates:
df['SupervisorID']=df['Supervisor'].map(df.drop_duplicates('Name').set_index('Name')['ID'])
print (df)
ID Name Supervisor SupervisorID
0 1 X Y 2
1 2 Y C 4
2 3 Z Y 2
3 4 C Y 2
4 5 V X 1
I have a data frame in pandas like this:
STATUS FEATURES
A [x,y,z]
A [t, y]
B [x,p,t]
B [x,p]
I want to count the frequency of the elements in the lists of features conditional on the status.
The desired output would be:
STATUS FEATURES FREQUENCY
A x 1
A y 2
A z 1
A t 1
B x 2
B t 1
B p 2
Let us do explode , the groupby size
s=df.explode(['FEATURES']).groupby(['STATUS','FEATURES']).size().reset_index()
Use DataFrame.explode and SeriesGroupBy.value_counts:
new_df = (df.explode('FEATURES')
.groupby('STATUS')['FEATURES']
.value_counts()
.reset_index(name='FRECUENCY'))
print(new_df)
Output
STATUS FEATURES FRECUENCY
0 A y 2
1 A t 1
2 A x 1
3 A z 1
4 B p 2
5 B x 2
6 B t 1
I am looking to find difference between two rows in a multi-level data by iterating values within certain class and been trying different techniques by reading tutorials as I am still new to python/pandas power.
what I am trying to do is to find out difference between scores of teacher and each student in certain class.
dataframe:
Class, Name ,Reference, stats
X ,SHE ,student, 30
X ,GHE ,student, 20
X ,GMK ,student ,10
X ,JKO ,teacher ,50
Y ,HHH ,student ,20
Y ,KLP ,teacher ,30
Output:
Class,teacher, student, difference
X, JKO, SHE,20
X, JKO,GHE, 30
X, JKO, GMK, 40
Y, KLP, HHH, 10
Can anyone help me by guiding me towards the right direction? there can be more than 1 teachers in a class.
Thank you
Just split your dataset into two data frames, one for students one for teachers. Then merge.
students = df[df.Reference == 'student'][['Class','Name','stats']]
teachers = df[df.Reference == 'teacher'][['Class','Name','stats']]
new_df = students.merge(teachers, on='Class', suffixes=('_student','_teacher'))
new_df['difference'] = new_df.stats_teacher - new_df.stats_student
print(new_df)
Class Name_student stats_student Name_teacher stats_teacher difference
0 X SHE 30 JKO 50 20
1 X GHE 20 JKO 50 30
2 X GMK 10 JKO 50 40
3 Y HHH 20 KLP 30 10
Use:
print (df)
Class Name Reference stats
0 X SHE student 30
1 X GHE student 20
2 X GMK student 10
3 X JKO teacher 50
4 X ABC teacher 100 <-added one new row for general data
5 Y HHH student 20
6 Y KLP teacher 30
df = (df.query("Reference == 'teacher'")
.merge(df.query("Reference == 'student'"), on='Class', suffixes=('_t','_s'))
.assign(difference=lambda x: x['stats_t'] - x['stats_s'])
.drop(['Reference_s','Reference_t','stats_s','stats_t'], axis=1)
.rename(columns={'Name_s':'student','Name_t':'teacher'})
)
print (df)
Class teacher student difference
0 X JKO SHE 20
1 X JKO GHE 30
2 X JKO GMK 40
3 X ABC SHE 70
4 X ABC GHE 80
5 X ABC GMK 90
6 Y KLP HHH 10
Explanation:
Filter DataFrame by query with student and teacher rows
Then merge by column Class for all combinations in per groups
Then assign new column with subtract
Remove unnecessary columns by drop
Last rename columns
Below is the code with many for loops. So there should an optimized solution than this. (Later i will try to update this solution in a better way)
import pandas as pd
df = pd.read_csv("student.csv")
ref = df[df['Reference'] == 'teacher'].index.values.astype(int)
df['TeacherName'] = 'NA'
df['Difference'] = 0
for i in range(len(ref)):
if(i == 0):
for j in range(ref[i]+1):
df['TeacherName'][j] = df['Name'][ref[i]]
df['Difference'][j] = df['stats'][ref[i]] - df['stats'][j]
else:
for j in range(ref[i-1]+1, ref[i]):
df['TeacherName'][j] = df['Name'][ref[i]]
df['Difference'][j] = df['stats'][ref[i]] - df['stats'][j]
df[~ df.index.isin(ref)]
I'm getting the index of row for every occurrence of df['Reference'] == 'teacher' into a list named ref which will be dropped from df after loop statements.
I'm fairly new to SQL and struggling to find a good way to run the following query.
I have a table that looks something like this:
NAME JOB GRADE MANAGER NAME
X 7 O
Y 6 X
Z 5 X
A 4 Z
B 3 Z
C 2 Z
In this table, it shows that Y and Z report into X, and A, B and C report into Z.
I want to create a computed column showing the grade each person's most senior direct report or "n/a" if they don't manage anyone. So that would look something like this:
NAME JOB GRADE MANAGER NAME GRADE OF MOST SENIOR REPORT
X 7 O 6
Y 6 X N/A
Z 5 X 4
A 4 Z N/A
B 3 Z N/A
C 2 Z N/A
How would I do this?
SELECT g.*,isnull(convert(nvarchar, (SELECT max(g2.GRADE)
FROM dbo.Grade g2 WHERE
g2.manager =g.NAME AND g2.NAME!=g.NAME )),'N/A') as most_graded
FROM dbo.Grade g
The max will find out the topmost graded
Input
X 7 O
y 6 X
Z 5 X
A 6 Z
C 2 Z
Output
X 7 O 6
y 6 X N/A
Z 5 X 6
A 6 Z N/A
C 2 Z N/A
Something like this:
select name, job_grade, manager_name,
(select max(job_grade) from grades g2
where g2.manager_name = g1.name) as grade_of_most_recent_senior
from grades g1
order by name;
The above is ANSI SQL and should work on any DBMS.
SQLFiddle example: http://sqlfiddle.com/#!15/e0806/1
i have a table which has localities numbered with unique numbers. Each locality has some buildings numbered that have the status as Activated = Y or N. i want to pick localities which have the min Building Activated = 'Y' count of 15.
Sample Data :
Locality ACTIVATED
1 Y
1 Y
1 N
1 N
1 N
1 N
2 Y
2 Y
2 Y
2 Y
2 Y
Eg : i need count of locality that with min. 5 Y in ACTIVATED Column
SELECT l.*
FROM Localities l
WHERE (SELECT COUNT(*) FROM Building b
WHERE b.LocalityNumber = l.LocalityNumber
AND b.Activated = 'Y') >= 15