I have a spreadsheet that has student test scoresĀ from each week.It has 5 fields,name(column A),year(column B),level(column C),week(columnD) and score(ColE).Each student have their own block on the sheet and each block of rows is seperated by and empty row.The blocks of rows will vary in size.(please see below).
I have code that will order the score(column E) from highest to lowest,(see before sort and after sort)
What I would like to do is to insert another column between the week column and the score column that gives the position of each score after the sort,and it would appear as it is in Fig 3 below .I think it would require some sort of RANK procedure and a loop.Notice also that some times the scores by a student for certain weeks can be the same so then there will be a joint top(or second or third etc)as with John Ellis joint 4th with two sets of 54 and phil simm who has a joint 1st and 4th.
Hope this makes sense.Any help much appreciated
At the bottom the spreadshhet figures I have also placed the code that I used for looping and sorting column E the score column.
BEFORE SORT(Fig1)
name year level week score
jill evans 5 2 10 56
jill evans 5 2 11 49
jill evans 5 2 12 77
jill evans 5 2 13 84
empty empty empty empty empty
john ellis 3 4 10 45
john ellis 3 4 11 54
john ellis 3 4 12 54
john ellis 3 4 13 29
john ellis 3 4 14 66
empty empty empty empty empty
phil simm 4 6 10 89
phil simm 4 6 11 76
phil simm 4 6 12 41
phil simm 4 6 13 41
phil simm 4 6 14 56
phil simm 4 6 15 59
phil simm 4 6 16 61
phil simm 4 6 17 61
AFTER SORT(Fig2)
name year level week score
jill evans 5 2 11 49
jill evans 5 2 10 56
jill evans 5 2 12 77
jill evans 5 2 13 84
empty empty empty empty empty
john ellis 3 4 13 29
john ellis 3 4 10 45
john ellis 3 4 11 54
john ellis 3 4 12 54
john ellis 3 4 14 66
empty empty empty empty empty
phil simm 4 6 12 41
phil simm 4 6 13 41
phil simm 4 6 14 56
phil simm 4 6 15 59
phil simm 4 6 16 61
phil simm 4 6 17 61
phil simm 4 6 11 76
phil simm 4 6 10 89
FIG3 with the position row included between week col and score col
name year level week position score
jill evans 5 2 11 1 49
jill evans 5 2 10 2 56
jill evans 5 2 12 3 77
jill evans 5 2 13 4 84
empty empty empty empty empty empty
john ellis 3 4 13 1 29
john ellis 3 4 10 2 45
john ellis 3 4 11 3 54
john ellis 3 4 12 3 54
john ellis 3 4 14 4 66
empty empty empty empty empty empty
phil simm 4 6 12 1 41
phil simm 4 6 13 1 41
phil simm 4 6 14 2 56
phil simm 4 6 15 3 59
phil simm 4 6 16 4 61
phil simm 4 6 17 4 61
phil simm 4 6 11 5 76
phil simm 4 6 10 6 89
So the position column reflects the new position of the score after the sort.
If two scores are the same then that will be a joint position,as with John Ellis joint 4th with two sets of 54 and phil simm who has a joint 1st and 4th.
Hope this nakes sense.Any help much appreciated
Sub sortone()
Application.ScreenUpdating = False
Dim Area As Range, sr As Long, er As Long
For Each Area In Range("A2", Range("E" & Rows.Count).End(xlUp)).SpecialCells(xlCellTypeConstants).Areas
With Area
sr = .Row
er = sr + .Rows.Count - 1
Range("A" & sr & ":E" & er).Sort key1:=Range("E" & sr), order1:=1
End With
Next Area
Application.ScreenUpdating = True
End Sub
many thanks
"RankIf" conditional rank by Subsets using SUMPRODUCT:
An alternate way to rank, kind of like if there were a RANKIF function, uses SUMPRODUCT to do conditional ranks:
Formula in D5:
=1+SUMPRODUCT((A$4:A$21=A5)*($B$4:$B$21>B5))
...the absolute/relative cells references are setup to allow the formula to be copied or filled down & right.
More Information:
Office.com : SUMPRODUCT Function (Excel)
Office.com : RANK Function (Excel)
MSDN : WorksheetFunction.Rank Method (VBA/Excel)
Office.com : Excel statistical functions: Representing ties by using RANK
you could try this code
Option Explicit
Sub sortone()
Dim Area As Range
Application.ScreenUpdating = False
With Range("A1", Range("A" & Rows.Count).End(xlUp))
.Columns(5).Insert
.Cells(1, 5).Value = "position"
For Each Area In .Resize(.Rows.Count - 1).Offset(1).SpecialCells(xlCellTypeConstants).Areas
With Area
.Resize(, 6).Sort key1:=.Range("F1"), order1:=1
.Offset(, 4).FormulaR1C1 = "=RANK(RC[1]," & .Offset(, 5).Resize(, 1).Address(, , xlR1C1) & ",1)"
End With
Next Area
End With
Application.ScreenUpdating = True
End Sub
name year level week position score
john ellis 3 4 13 1 29
phil simm 4 6 12 2 41
phil simm 4 6 13 2 41
john ellis 3 4 10 4 45
jill evans 5 2 11 5 49
john ellis 3 4 11 6 54
john ellis 3 4 12 6 54
jill evans 5 2 10 8 56
phil simm 4 6 14 8 56
phil simm 4 6 15 10 59
phil simm 4 6 16 11 61
phil simm 4 6 17 11 61
john ellis 3 4 14 13 66
phil simm 4 6 11 14 76
jill evans 5 2 12 15 77
jill evans 5 2 13 16 84
phil simm 4 6 10 17 89
empty empty empty empty #VALUE! empty
empty empty empty empty #VALUE! empty
Related
I have a multiindex column dataframe. I want to preserve the existing index, but move a level from the multindex columns to become a sublevel of the index instead.
I can't figure out the correct incantation of melt/stack/unstack/pivot to move from what i have to what i want. Unstacking() turned things into a series and lost the original date index.
names = ['mike', 'matt', 'dave']
details = ['bla', 'foo', ]
columns = pd.MultiIndex.from_tuples((n,d) for n in names for d in details)
index = pd.date_range(start="2022-10-30", end="2022-11-3" ,freq="d", )
have = pd.DataFrame(np.random.randint(0,100, size = (5,6)), index=index, columns=columns)
have
want_columns = details
want_index = pd.MultiIndex.from_product([index, names])
want = pd.DataFrame(np.random.randint(0,100, size = (15,2)), index=want_index, columns=want_columns)
want
Use DataFrame.stack with level=0:
print (have.stack(level=0))
bla foo
2022-10-30 dave 88 18
matt 49 55
mike 92 45
2022-10-31 dave 33 27
matt 53 41
mike 24 16
2022-11-01 dave 48 19
matt 94 75
mike 11 19
2022-11-02 dave 16 90
matt 14 93
mike 38 72
2022-11-03 dave 80 15
matt 97 2
mike 11 94
I am new to pandas. And recently I have been stuck on a question.
I need to find the name who has the lowest score. But i just don't know how.
df =
name score subject
0 Amy 100
1 Amy 99
3 Amy 95
4 Bob 98
5 Bob 88
6 Bob 85
7 Cathy 94
8 Cathy 87
9 Cathy 90
It would be so great if anyone can help.
You can use min() function
df.loc[df.score==df.score.min(), 'name']
I have a dataset, in which it has a lot of entries for a single location. I am trying to find a way to sum up all of those entries without affecting any of the other columns. So, just in case I'm not explaining it well enough, I want to use a dataset like this:
Locations Cyclists maleRunners femaleRunners maleCyclists femaleCyclists
Bedford 10 12 14 17 27
Bedford 11 40 34 9 1
Bedford 7 1 2 3 3
Leeds 1 1 2 0 0
Leeds 20 13 6 1 1
Bath 101 20 33 41 3
Bath 11 2 3 1 0
And turn it into something like this:
Locations Cyclists maleRunners femaleRunners maleCyclists femaleCyclists
Bedford 28 53 50 29 31
Leeds 21 33 39 1 1
Bath 111 22 36 42 3
Now, I have read up that a groupby should work in a way, but from my understanding a group by will change it into 2 columns and I don't particularly want to make hundreds of 2 columns and then merge it all. Surely there's a much simpler way to do this?
IIUC, groupby+sum will work for you:
df.groupby('Locations',as_index=False,sort=False).sum()
Output:
Locations Cyclists maleRunners femaleRunners maleCyclists femaleCyclists
0 Bedford 28 53 50 29 31
1 Leeds 21 14 8 1 1
2 Bath 112 22 36 42 3
Pivot table should work for you.
new_df = pd.pivot_table(df, values=['Cyclists', 'maleRunners', 'femalRunners',
'maleCyclists','femaleCyclists'],index='Locations', aggfunc=np.sum)
Basically, I just want map values from one dataframe to another based on some common column, ('ID' + 'Key')
df1:
ID Name
1 Sam
2 Ryan
4 Sam
16 Brian
7 Tom
8 Gemma
9 Steve
11 Sarah
df1:
Key PPID M
1 22 MM
2 23 R
4 25 MM
16 27 RR
7 21 RR
8 11 R
0 13 SS
11 14 RR
new df:
ID PPID M
Sam 22 MM
Ryan 23 R
Sam 25 MM
Brian 27 RR
Tom 21 RR
Gemma 11 R
0 13 SS
Sarah 14 RR
IIUC
df1.Key.replace(dict(zip(df.ID,df.Name)),inplace=True)
df1
Key PPID M
0 Sam 22 MM
1 Ryan 23 R
2 Sam 25 MM
3 Brian 27 RR
4 Tom 21 RR
5 Gemma 11 R
6 0 13 SS
7 Sarah 14 RR
This is a trivial example, but I am trying to understand how to think creatively using SQL.
For example, I have the following tables below, and I want to query the names of folks who have three or more questions. How can I do this without using HAVING or COUNT? I wonder if this is possible using JOINS or something similar?
FOLKS
folkID name
---------- --------------
01 Bill
02 Joe
03 Amy
04 Mike
05 Chris
06 Elizabeth
07 James
08 Ashley
QUESTION
folkID questionRating questionDate
---------- ---------- ----------
01 2 2011-01-22
01 4 2011-01-27
02 4
03 2 2011-01-20
03 4 2011-01-12
03 2 2011-01-30
04 3 2011-01-09
05 3 2011-01-27
05 2 2011-01-22
05 4
06 3 2011-01-15
06 5 2011-01-19
07 5 2011-01-20
08 3 2011-01-02
Using SUM or CASE seems to be cheating to me!
I'm not sure if it's possible in your current formulation, but if you add a primary key to the question table (questionid) then the following seems to work:
SELECT DISTINCT Folks.folkid, Folks.name
FROM ((Folks
INNER JOIN Question AS Question_1 ON Folks.folkid = Question_1.folkid)
INNER JOIN Question AS Question_2 ON Folks.folkid = Question_2.folkid)
INNER JOIN Question AS Question_3 ON Folks.folkid = Question_3.folkid
WHERE (((Question_1.questionid) <> [Question_2].[questionid] And
(Question_1.questionid) <> [Question_3].[questionid]) AND
(Question_2.questionid) <> [Question_3].[questionid]);
Sorry, this is in MS Access SQL, but it should translate to any flavour of SQL.
Returns:
folkid name
3 Amy
5 Chris
Update: Just to explain why this works. Each join will return all the question ids asked by that person. The where clauses then leaves only unique rows of question ids. If there are less than three questions asked then there will be no unique rows.
For example, Bill:
folkid name Question_3.questionid Question_1.questionid Question_2.questionid
1 Bill 1 1 1
1 Bill 1 1 2
1 Bill 1 2 1
1 Bill 1 2 2
1 Bill 2 1 1
1 Bill 2 1 2
1 Bill 2 2 1
1 Bill 2 2 2
There are no rows where all the ids are different.
however for Amy:
folkid name Question_3.questionid Question_1.questionid Question_2.questionid
3 Amy 4 4 5
3 Amy 4 4 4
3 Amy 4 4 6
3 Amy 4 5 4
3 Amy 4 5 5
3 Amy 4 5 6
3 Amy 4 6 4
3 Amy 4 6 5
3 Amy 4 6 6
3 Amy 5 4 4
3 Amy 5 4 5
3 Amy 5 4 6
3 Amy 5 5 4
3 Amy 5 5 5
3 Amy 5 5 6
3 Amy 5 6 4
3 Amy 5 6 5
3 Amy 5 6 6
3 Amy 6 4 4
3 Amy 6 4 5
3 Amy 6 4 6
3 Amy 6 5 4
3 Amy 6 5 5
3 Amy 6 5 6
3 Amy 6 6 4
3 Amy 6 6 5
3 Amy 6 6 6
There are several rows which have different ids and hence these get returned by the above query.
you can try sum , to replace count.
SELECT SUM(CASE WHEN Field_name >=3 THEN field_name ELSE 0 END)
FROM tabel_name
SELECT f.*
FROM (
SELECT DISTINCT
COUNT(*) OVER (PARTITION BY folkID) AS [Count] --count questions for folks
,a.folkID
FROM QUESTION AS q
) AS p
INNER JOIN FOLKS as f ON f.folkID = q.folkID
WHERE p.[Count] > 3