I have this table:
Genres Episodes
Drama 2
Drama 1
Drama 0
Comedy 2
Comedy 1
Comedy 0
Kids 3
Kids 2
Kids 1
Kids 0
I want to create a calculated field:
Genres Episodes Criteria
Drama 2 Complete
Drama 1 Partial
Drama 0 Obsolete
Comedy 2 Complete
Comedy 1 Partial
Comedy 0 Obsolete
Kids 3 Complete
Kids 2 Partial
Kids 1 Partial
Kids 0 Obsolete
My calculated field formula throws an error:
ifelse({Genres} = 'Drama' and (count({Episodes}) = 2), 'Complete',
{Genres} = 'Drama' and (count({Episodes}) = 0), 'Obsolete',
{Genres} = 'Drama' and (count({Episodes}) = 1), 'Partial',
{Genres} = 'Comedy' and (count({Episodes}) = 2), 'Complete',
{Genres} = 'Comedy' and (count({Episodes}) = 0), 'Obsolete',
{Genres} = 'Comedy' and (count({Episodes}) = 1), 'Partial',
{Genres} = 'Kids' and (count({Episodes}) = 3), 'Complete',
{Genres} = 'Kids' and (count({Episodes}) = 0), 'Obsolete','Partial')
Error:
Mismatched Aggregation. Custom Aggregations cannot contain both aggregate Count and non-Aggregated fields.
Can someone help me with this!
If anyone runs into this issue of mismatched aggregation, I found a workaround.
Simply convert the string into a numerical format (formula) as I did:
genres_numbers:
ifelse({Genres} = 'Drama',1, {Genres} = 'Comedy',2, 3)
Secondly, create a count formula for the Episodes column:
count_episodes:
count({Episodes})
Afterwards, take the average of the both and assign the Criteria values as:
ifelse(avg({genres_numbers}) = 1 and avg({count_episodes}) = 2, 'Complete'...)
This worked for me :)
Related
I have a df and I need to select rows based on some conditions in multiple columns.
Here is what I have
import pandas as pd
dat = [('p','q', 5), ('k','j', 2), ('p','-', 5), ('-','p', 4), ('q','pkjq', 3), ('pkjq','q', 2)
df = pd.DataFrame(dat, columns = ['a', 'b', 'c'])
df_dat = df[(df[['a','b']].isin(['k','p','q','j']) & df['c'] > 3)] | df[(~df[['a','b']].isin(['k','p','q','j']) & df['c'] > 2 )]
Expected result = [('p','q', 5), ('p','-', 5), ('-','p', 4), ('q','pkjq', 3)]
Result I am getting is an all false dataframe
When you have the complicate condition I recommend, make the condition outside the slice
cond1 = df[['a','b']].isin(['k','p','q','j']).any(1) & df['c'].gt(3)
cond2 = (~df[['a','b']].isin(['k','p','q','j'])).any(1) & df['c'].gt(2)
out = df.loc[cond1 | cond2]
Out[305]:
a b c
0 p q 5
2 p - 5
3 - p 4
4 q pkjq 3
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Let us begin by calculating the attendence_score of each students. Do the following:
Create a new column called attendence_score.
Fill in the column using the following criteria:
No Absence = 5
1-5 Absences = 4
6-10 Absences = 3
11-15 Absences = 2
16-20 Absences = 1
21 or more Absences = 0
In dataset there's a column named absenses.
My ideas is use if condition to do this.
But I searched a lot of codes in here, most of codes are fill in NaN data. How to fix my case?
The manual way:
s = df['absences']
df.loc[s == 0, 'absence_score'] = 5
df.loc[s.between(1, 5), 'absence_score'] = 4
df.loc[s.between(6, 10), 'absence_score'] = 3
df.loc[s.between(11, 15), 'absence_score'] = 2
df.loc[s.between(16, 20), 'absence_score'] = 1
df.loc[s > 21, 'absence_score'] = 0
Using Category:
df['absence_score'] = pd.cut(df['absences'], [-np.inf, 0, 5, 10, 15, 20, np.inf], labels=range(5,-1,-1))
Or you can take advantage of the uniform step across the levels and use a math formula:
df['absence_score'] = 5 - np.ceil(df['absences'].div(5).clip(upper=5)).astype('int')
conditions = [
(df['likes_count'] <= 2),
(df['likes_count'] > 2) & (df['likes_count'] <= 9),
(df['likes_count'] > 9) & (df['likes_count'] <= 15),
(df['likes_count'] > 15)
]
# create a list of the values we want to assign for each condition
values = ['tier_4', 'tier_3', 'tier_2', 'tier_1']
# create a new column and use np.select to assign values to it using our lists as arguments
df['tier'] = np.select(conditions, values)
# display updated DataFrame
df.head()
or like this?
df = student
print(df)
#df['attendence_score'] = np.where((df['absences'] =0 ) ,5, df['attendence_score'])
#df.loc[df['absences'] = 0, 'attendence_score'] = 5
attendence_score = [
(df['absences'] == 0),
(df['absences'] > 0) & (df['absences'] <= 5),
(df['absences'] > 5) & (df['absences'] <= 10),
(df['absences'] > 10) & (df['absences'] <= 15),
(df['absences'] > 15) & (df['absences'] <= 20),
(df['absences'] > 21)
]
# create a list of the values we want to assign for each condition
values = ['5', '4', '3', '2','1','0']
# create a new column and use np.select to assign values to it using our lists as arguments
df['attendence_score'] = np.select(attendence_score, values)
# display updated DataFrame
df.head()
I finished it by myself. I love myself!!!!
I have a dataset like this:
df = pd.DataFrame({
"333-0": [123,123,123],
"5985-0.0": [1,2,3],
"5985-0.1":[1,2,3],
"5985-0.2":[1,2,3]
},
index = [0,1,2] )
Here, we have three columns ["5985-0.0", "5985-0.1", "5985-0.2"] that represent the first, second and third float readings of thing 5985-0 -- i.e. .x represents an array index.
I'd like to take multiple columns and collapse them into a single column 5985-0 containing some kind of list of float, which I can do like this:
srccols = ["5985-0.0", "5985-0.1", "5985-0.2"]
df["5985-0"] = df[srccols].apply(tuple, axis=1)
df.dropna(srccols, axis=1)
333-0 5985-0
0 123 (1, 1, 1)
1 123 (2, 2, 2)
2 123 (3, 3, 3)
which I can then store as an SQL table with an array column.
However, apply(tuple) is very slow. Is there a faster, more idiomatic pandas way to combine multiple columns into one.
(First person to say "normalized" gets a downvote).
My Choice
Assuming I know the columns
thing = '5985-0'
cols = ['5985-0.0', '5985-0.1', '5985-0.2']
k = len(cols)
v = df.values
l = [v[:, df.columns.get_loc(c)].tolist() for c in cols]
s = pd.Series(list(zip(*l)), name=thing)
df.drop(cols, 1).join(s)
333-0 5985-0
0 123 (1, 1, 1)
1 123 (2, 2, 2)
2 123 (3, 3, 3)
Base Case
Using filter, join, and apply(tuple, 1)
thing = '5985-0'
d = df.filter(like=thing)
s = d.apply(tuple, 1).rename(thing)
cols = d.columns
df.drop(cols, 1).join(s)
333-0 5985-0
0 123 (1, 1, 1)
1 123 (2, 2, 2)
2 123 (3, 3, 3)
Option 2
Using filter, join, pd.Series
thing = '5985-0'
d = df.filter(like=thing)
s = pd.Series(d.values.tolist(), name=thing)
cols = d.columns
df.drop(cols, 1).join(s)
333-0 5985-0
0 123 [1, 1, 1]
1 123 [2, 2, 2]
2 123 [3, 3, 3]
Option 3
Using filter, join, pd.Series, and zip
thing = '5985-0'
d = df.filter(like=thing)
s = pd.Series(list(zip(*d.values.T)), name=thing)
cols = d.columns
print(df.drop(cols, 1).join(s))
333-0 5985-0
0 123 (1, 1, 1)
1 123 (2, 2, 2)
2 123 (3, 3, 3)
Timing
Large Data Set
df = pd.concat([df] * 10000, ignore_index=True
%%timeit
thing = '5985-0'
d = df.filter(like=thing)
s = d.apply(tuple, 1).rename(thing)
cols = d.columns
df.drop(cols, 1).join(s)
1 loop, best of 3: 350 ms per loop
%%timeit
thing = '5985-0'
cols = ['5985-0.0', '5985-0.1', '5985-0.2']
k = len(cols)
v = df.values
l = [v[:, df.columns.get_loc(c)].tolist() for c in cols]
s = pd.Series(list(zip(*l)), name=thing)
df.drop(cols, 1).join(s)
100 loops, best of 3: 4.06 ms per loop
%%timeit
thing = '5985-0'
d = df.filter(like=thing)
s = pd.Series(d.values.tolist(), name=thing)
cols = d.columns
df.drop(cols, 1).join(s)
100 loops, best of 3: 4.56 ms per loop
%%timeit
thing = '5985-0'
d = df.filter(like=thing)
s = pd.Series(list(zip(*d.values.T)), name=thing)
cols = d.columns
df.drop(cols, 1).join(s)
100 loops, best of 3: 6.89 ms per loop
I have two spreadsheets:
I want to match both sheets and make sure that dates match for every person. E.g. Person 1 has three different dates and I want to match them exactly 1:1.
Person 1, 3/2/2015 (Table A) -> Person 1, 3/2/2015 (Table B)
Person 1, 3/5/2015 (Table A) -> Person 1, 3/5/2015 (Table B)
Person 1, 3/6/2015 (Table A) -> Person 1, 3/6/2015 (Table B)
For the moment I loop through "No." column in Table A and use Application.VLookup on Table B but that only works when a Person has only one date. Otherwise it compares to the first date from Table B. See code:
For sheetArowCounter= 2 To sheetAlastRow
Sheets("A").Select
sheetAperson = Cells(sheetArowCounter, 1)
sheetAdate = Cells(sheetArowCounter, 2)
Sheets("B").Select
sheetBdate = Application.VLookup(sheetAperson, _
Sheets("B").Range(Cells(1, 1), Cells(sheetBLastRow, 2)), 2, False)
If IsError(sheetBdate ) Then
personFromTableAnotFound = personFromTableAnotFound + 1
ElseIf sheetBdate <> sheetAdate Then
sheetAdateNotMatched = sheetAdateNotMatched + 1
End If
sheetAdateCompared = sheetAdateCompared + 1
Next sheetArowCounter
Any idea how can I do that?
I agreed with Jeeped above. Try using a SUMIF. You can even nest the SUMIF in an IF statement to return text: ie: "Match", "No match"
I'd go for a countifs function which can test for criteria against multiple columns. Here is an example for how you could implement this in your code:
For sheetArowCounter = 2 To sheetAlastRow
Sheets("A").Select
sheetAperson = Cells(sheetArowCounter, 1)
sheetAdate = Cells(sheetArowCounter, 2)
Sheets("B").Select
'using the countifs function eg. =COUNTIFS(B!A3:A11,A!A3,B!B3:B11,A!B3)
PersonDateMatches = Application.WorksheetFunction.CountIfs(Sheets("B").Range(Cells(1, 1), Cells(sheetBLastRow, 1)), sheetAperson, Sheets("B").Range(Cells(1, 2), Cells(sheetBLastRow, 2)), sheetAdate)
If PersonDateMatches = 0 Then
personDateFromTableAnotFound = personDateFromTableAnotFound + 1
End If
sheetAdateCompared = sheetAdateCompared + 1
Next sheetArowCounter
I'm still new to VBA EXCEL programming.
I need to extract data from one sheet to another in a different format. However, many of the tutorial only seems to extract data rows by rows or range by range to a fresh sheet.
So this is my DataSheet1:
**School Principal Student Student ID**
United College Bill Gates Peter p3214
United College Bill Gates Mary p4213
United College Bill Gates Forge p7621
Beverly High Melin Brad Goge p1111
Beverly High Melin Brad Fred p2222
I want to make the data into a custom format in another datasheet. So here's my desired result:
School United College
Principal Bill Gates
Student Student ID
Peter p3214
Mary p4213
Forge p7621
School Beverly High
Principal Melinda Brad
Student Student ID
Goge p1111
Fred p2222
Below are some of my codes to get from Sheet1 to Sheet2, but the code only shows to get the data from range to range. What are some of the concepts that should be used to extract the data into a custom format? My codes:
Dim secondsheet As Worksheet
Set secondsheet = workbook.Worksheets(2)
Dim firstsheet As Worksheet
Set firstsheet = workbook.Worksheets(1)
secondsheet.Range("A1", "C10").Value = firstsheet.Range("A1", "C10").Value
And the format that I intend to put my data in:
Range(<<call function for range>>).Select
With Selection
.Value = "School"
.Offset(1,0).Value = "Principal"
.Offset(1,0).Font.Bold = True
.Offset(4,1).Value = "Student"
.Offset(4,1).Font.Bold = True
.Offset(4,2).Value = "Student ID"
.Offset(4,2).Font.Bold = True
So the answer I looking for is looping function since it's in this format. Any kind souls are willing to help me understand the concept for vba?
Keep Data in Sheet1 and Run the macro... Small errors are there in my macro.. If u work it out can easily rectify
col1 = Sheet1.Cells(2, 1)
prin1 = Sheet1.Cells(2, 2)
Sheet2.Cells(1, 1) = "School"
Sheet2.Cells(1, 2) = "Principal"
Sheet2.Cells(2, 1) = col1
Sheet2.Cells(2, 2) = prin1
b = 5
c = 1
Sheet2.Cells(b - 1, 1) = "Student"
Sheet2.Cells(b - 1, 2) = "St ID"
a = 3
Do While Sheet1.Cells(a, 1) <> ""
If col1 = Sheet2.Cells(c + 1, 1) Then
Sheet2.Cells(b, 1) = Sheet1.Cells(a, 3)
Sheet2.Cells(b, 2) = Sheet1.Cells(a, 4)
b = b + 1
Else
c = b + 1
b = c + 3
col1 = Sheet1.Cells(a, 1)
prin1 = Sheet1.Cells(a, 2)
Sheet2.Cells(c, 1) = "School"
Sheet2.Cells(c, 2) = "Principal"
Sheet2.Cells(c + 1, 1) = col1
Sheet2.Cells(c + 1, 2) = prin1
Sheet2.Cells(b - 1, 1) = "Student"
Sheet2.Cells(b - 1, 2) = "St ID"
Sheet2.Cells(b, 1) = Sheet1.Cells(a, 3)
Sheet2.Cells(b, 2) = Sheet1.Cells(a, 4)
End If
a = a + 1
Loop