Python:Fill a column in a dataframe if a condition is met [closed] - pandas

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Let us begin by calculating the attendence_score of each students. Do the following:
Create a new column called attendence_score.
Fill in the column using the following criteria:
No Absence = 5
1-5 Absences = 4
6-10 Absences = 3
11-15 Absences = 2
16-20 Absences = 1
21 or more Absences = 0
In dataset there's a column named absenses.
My ideas is use if condition to do this.
But I searched a lot of codes in here, most of codes are fill in NaN data. How to fix my case?

The manual way:
s = df['absences']
df.loc[s == 0, 'absence_score'] = 5
df.loc[s.between(1, 5), 'absence_score'] = 4
df.loc[s.between(6, 10), 'absence_score'] = 3
df.loc[s.between(11, 15), 'absence_score'] = 2
df.loc[s.between(16, 20), 'absence_score'] = 1
df.loc[s > 21, 'absence_score'] = 0
Using Category:
df['absence_score'] = pd.cut(df['absences'], [-np.inf, 0, 5, 10, 15, 20, np.inf], labels=range(5,-1,-1))
Or you can take advantage of the uniform step across the levels and use a math formula:
df['absence_score'] = 5 - np.ceil(df['absences'].div(5).clip(upper=5)).astype('int')

conditions = [
(df['likes_count'] <= 2),
(df['likes_count'] > 2) & (df['likes_count'] <= 9),
(df['likes_count'] > 9) & (df['likes_count'] <= 15),
(df['likes_count'] > 15)
]
# create a list of the values we want to assign for each condition
values = ['tier_4', 'tier_3', 'tier_2', 'tier_1']
# create a new column and use np.select to assign values to it using our lists as arguments
df['tier'] = np.select(conditions, values)
# display updated DataFrame
df.head()
or like this?

df = student
print(df)
#df['attendence_score'] = np.where((df['absences'] =0 ) ,5, df['attendence_score'])
#df.loc[df['absences'] = 0, 'attendence_score'] = 5
attendence_score = [
(df['absences'] == 0),
(df['absences'] > 0) & (df['absences'] <= 5),
(df['absences'] > 5) & (df['absences'] <= 10),
(df['absences'] > 10) & (df['absences'] <= 15),
(df['absences'] > 15) & (df['absences'] <= 20),
(df['absences'] > 21)
]
# create a list of the values we want to assign for each condition
values = ['5', '4', '3', '2','1','0']
# create a new column and use np.select to assign values to it using our lists as arguments
df['attendence_score'] = np.select(attendence_score, values)
# display updated DataFrame
df.head()
I finished it by myself. I love myself!!!!

Related

how do I select rows from pandas df without returning False values?

I have a df and I need to select rows based on some conditions in multiple columns.
Here is what I have
import pandas as pd
dat = [('p','q', 5), ('k','j', 2), ('p','-', 5), ('-','p', 4), ('q','pkjq', 3), ('pkjq','q', 2)
df = pd.DataFrame(dat, columns = ['a', 'b', 'c'])
df_dat = df[(df[['a','b']].isin(['k','p','q','j']) & df['c'] > 3)] | df[(~df[['a','b']].isin(['k','p','q','j']) & df['c'] > 2 )]
Expected result = [('p','q', 5), ('p','-', 5), ('-','p', 4), ('q','pkjq', 3)]
Result I am getting is an all false dataframe
When you have the complicate condition I recommend, make the condition outside the slice
cond1 = df[['a','b']].isin(['k','p','q','j']).any(1) & df['c'].gt(3)
cond2 = (~df[['a','b']].isin(['k','p','q','j'])).any(1) & df['c'].gt(2)
out = df.loc[cond1 | cond2]
Out[305]:
a b c
0 p q 5
2 p - 5
3 - p 4
4 q pkjq 3

I want to use values from dataframeA as upper and lower bounds to filter dataframeB

I have two dataframes A and B.
Dataframe A has 4 columns with 2 sets of maximum and minimums that I want to use as upper and lower bounds for 2 columns in dataframe B.
latitude = data['y']
longitude = data['x']
upper_lat = coords['lat_max']
lower_lat = coords['lat_min']
upper_lon = coords['long_max']
lower_lon = coords['long_min']
def filter_data_2(filter, upper_lat, lower_lat, upper_lon, lower_lon, lat, lon):
v = filter[(lower_lat <= lat <= upper_lat ) & (lower_lon <= lon <= upper_lon)]
return v
newdata = filter_data_2(data, upper_lat, lower_lat, upper_lon, lower_lon, latitude, longitude)
ValueError: Can only compare identically-labeled Series objects
MWE:
import pandas as pd
a = {'lower_lon': [2,4,6], 'upper_lon': [4,6,10], 'lower_lat': [1,3,5], 'upper_lat': [3,5,7]}
constraints = pd.DataFrame(data=a)
constraints
lower_lon upper_lon lower_lat upper_lat
0 2 4 1 3
1 4 6 3 5
2 6 10 5 7
b = {'lon' : [3, 5, 7, 9, 11, 13, 15], 'lat': [2, 4, 6, 8, 10, 12, 14]}
to_filter = pd.DataFrame(data=b)
to_filter
lon lat
0 3 2
1 5 4
2 7 6
3 9 8
4 11 10
5 13 12
6 15 14
lat = to_filter['lat']
lon = to_filter['lon']
lower_lon = constraints['lower_lon']
upper_lon = constraints['upper_lon']
lower_lat = constraints['lower_lat']
upper_lat = constraints['upper_lat']
v = to_filter[(lower_lat <= lat) & (lat <= upper_lat) & (lower_lon <= lon) & (lon <= upper_lon)]
Expected Results
v
lon lat
0 3 2
1 5 4
2 7 6
The global filter will be the union of the sets of all the contraints, in pandas you could:
v = pd.DataFrame()
for i in constraints.index:
# Current constraints
min_lon, max_lon, min_lat, max_lat = constraints.loc[i, :]
# Apply filter
df = to_filter[ (to_filter.lon>= min_lon & to_filter.lon<= max_lon) & (to_filter.lat>= min_lat & to_filter.lat<= max_lat) ]
# Join in a single df previous and current filter outcome
v= pd.concat( [v, df] )
# Remove duplicates, if any
v = v.drop_duplicates()

Select Rows That Does Not Contain any Negative Or Missing Value

Assume a database table has a few hundred columns. In SQL statements, how would you select rows/records that do not contain any negative or missing value? Can you do it using the sqldf package for R users?
Here is an example of data frame with 6 rows and 2 columns:
D = data.frame(X = c(23, -24, 35, 12, 34, 41),
Y = c(100, 98, 89, NA, 56, 90))
The SQL statement(s) should only return a table containing the rows 1, 3, 5, and 6.
text = "X Y
23 100
-24 98
35 89
12 NA
34 56
41 90"
df = read.table(text=text, header = T)
# install.packages("sqldf")
library(sqldf)
conditions = c(">=0","NOT NULL")
columns = colnames(df)
applyConditions <- function(columns,conditions){
grid = expand.grid(columns,conditions)
apply(grid, 1,
function(x) paste(x, collapse = " ")
)
}
select <- "SELECT * FROM df where "
where <- paste(applyConditions(columns,conditions),collapse = " AND ")
sqldf(paste(select,where))

Power BI Report Builder Indicator Formula

I am adding in an indicator to a PBI Report Builder Report. The indicator is based off multiple fields from the dataset so I need to use a formula, to create the three up/down/side arrows.
Previously in Crystal Reports this could be implemented using a series of IF statements as follows. The below example is what is required for the down arrow. (the other 2 arrows also have multiple calculations)
IF (({spScorecard_SLView;1.CATEGORY_ID} = 4) OR
({spScorecard_SLView;1.CATEGORY_ID} = 25)) THEN
IF ({spScorecard_SLView;1.PM_3MM_NC_CNT}-{spScorecard_SLView;1.3MM_NC_CNT}) <
0 THEN 'Down Arrow'
ELSE IF (({spScorecard_SLView;1.CATEGORY_ID} = 21)
OR({spScorecard_SLView;1.CATEGORY_ID} = 26) OR
({spScorecard_SLView;1.CATEGORY_ID} = 41)) THEN
IF ({spScorecard_SLView;1.CM_TOTAL_CNT}> 0) AND
(({spScorecard_SLView;1.PM_3MM_TOTAL_CNT} = 0) OR
({spScorecard_SLView;1.3MM_TOTAL_CNT} = 0)) AND
({spScorecard_SLView;1.3MM_NC_CNT} > 0) AND
(((({spScorecard_SLView;1.3MM_TOTAL_CNT} - {spScorecard_SLView;1.3MM_NC_CNT})
/ {spScorecard_SLView;1.3MM_TOTAL_CNT}) * 100) >= 0.00) THEN 'Down Arrow' //
ELSE IF ((((({spScorecard_SLView;1.3MM_TOTAL_CNT} -
{spScorecard_SLView;1.3MM_NC_CNT}) / {spScorecard_SLView;1.3MM_TOTAL_CNT}) *
100) -((({spScorecard_SLView;1.PM_3MM_TOTAL_CNT} -
{spScorecard_SLView;1.PM_3MM_NC_CNT}) /
{spScorecard_SLView;1.PM_3MM_TOTAL_CNT}) * 100))/100) < 0.00 THEN
'Down Arrow'
I am stuck as to how to do something similar in PBI Report builder. Should I create a formula in the Value field under Value and States, and then delete any arrow settings under the Indicator States?
Can you create a formula using 'Down Arrow' etc in an IIf statement? I can only get indicator data returned when selecting 1 field under Value, but I need multiple fields & conditions.
SSRS Reports are similar to PBI Report builder so if there are any examples using it that may be of help. I am connecting to a SQL Server stored proc to pull back the data.
Thanks
Blowers
I would approach it like this...
Set a formula in the Indicator Value so that you return a number that corresponds to the arrow you want to show (e.g. return 1, 2 or 3)
You can use whatever format you feel comfotable with but I would suggest using the SWITCH() function rather than nested IIFs.
For example this checks two fields and returns one of three values, (this is just a random example to illustrate the point)
=SWITCH(
SUM(Fields!Amount.Value) >5000 AND Fields!Year.Value >2019, 1
, SUM(Fields!Amount.Value) >10000 AND Fields!Year.Value <2019, 2
, True, 3
)
Switch takes pairs of expressions and return values. It returns the value when it hits the first expression that evaluates to True. So this reads...
If the aggregated Amount is greater than 5000 and the Year >2019 then return 1
If the aggregated Amount is greater than 10000 and the Year <2019
then return 12
Else return 3
The final 'True', as it will always return true acts like an ELSE
Anyway, this will return a value of either 1, 2 or 3
Then in the Indicator Properties, just set the range for each indicator to 1, 2 or 3 like this
I ended up using IIF Logic to solve this, some of the calculations were too awkward in the end and it was easier for me to use IIF.
Using the 1,2,3 indicator values works well.
Here is the expression that i ended up using:
=IIF(Fields!CATEGORY_ID.Value = 4 AND Sum(Fields!PM_3MM_NC_CNT.Value -
Fields!Q3MM_NC_CNT.Value) < 0,1,
IIF(Fields!CATEGORY_ID.Value = 25 AND Sum(Fields!PM_3MM_NC_CNT.Value -
Fields!Q3MM_NC_CNT.Value) < 0,1,
IIF(Fields!CATEGORY_ID.Value = 4 AND Fields!PM_3MM_NC_CNT.Value <> 0
AND Fields!Q3MM_NC_CNT.Value <> 0 AND Fields!PM_3MM_NC_CNT.Value =
Fields!Q3MM_NC_CNT.Value ,2,
IIF(Fields!CATEGORY_ID.Value = 25 AND Fields!PM_3MM_NC_CNT.Value <> 0
AND Fields!Q3MM_NC_CNT.Value <> 0 AND Fields!PM_3MM_NC_CNT.Value =
Fields!Q3MM_NC_CNT.Value ,2,
IIF(Fields!CATEGORY_ID.Value = 4 AND Sum(Fields!PM_3MM_NC_CNT.Value -
Fields!Q3MM_NC_CNT.Value) > 0, 3,
IIF(Fields!CATEGORY_ID.Value = 25 AND Sum(Fields!PM_3MM_NC_CNT.Value -
Fields!Q3MM_NC_CNT.Value) > 0, 3,
IIF(Fields!CATEGORY_ID.Value = 4 AND Fields!PM_3MM_NC_CNT.Value = 0 AND
Fields!Q3MM_NC_CNT.Value = 0,4,
IIF(Fields!CATEGORY_ID.Value = 25 AND Fields!PM_3MM_NC_CNT.Value = 0
AND Fields!Q3MM_NC_CNT.Value = 0,4,
IIF(Fields!CATEGORY_ID.Value = 21 AND Fields!CM_TOTAL_CNT.Value > 0 AND
Fields!PM_3MM_TOTAL_CNT.Value =0,1,
IIF(Fields!CATEGORY_ID.Value = 26 AND Fields!CM_TOTAL_CNT.Value > 0 AND
Fields!PM_3MM_TOTAL_CNT.Value =0,1,
IIF(Fields!CATEGORY_ID.Value = 41 AND Fields!CM_TOTAL_CNT.Value > 0 AND
Fields!PM_3MM_TOTAL_CNT.Value =0,1,
IIF(Fields!CATEGORY_ID.Value = 21 AND Fields!Q3MM_TOTAL_CNT.Value = 0
AND Fields!Q3MM_NC_CNT.Value > 0 AND Fields!TrendCalculation1.Value -
Fields!TrendCalculation2.Value > 0.00,1,
IIF(Fields!CATEGORY_ID.Value = 26 AND Fields!Q3MM_TOTAL_CNT.Value = 0
AND Fields!Q3MM_NC_CNT.Value > 0 AND Fields!TrendCalculation1.Value -
Fields!TrendCalculation2.Value > 0.00,1,
IIF(Fields!CATEGORY_ID.Value = 41 AND Fields!Q3MM_TOTAL_CNT.Value = 0
AND Fields!Q3MM_NC_CNT.Value > 0 AND Fields!TrendCalculation1.Value -
Fields!TrendCalculation2.Value > 0.00,1,
IIF(Fields!CATEGORY_ID.Value <> 12 AND Fields!CATEGORY_ID.Value <> 30
AND Fields!PM_3MM_TOTAL_CNT.Value = 0,3,
IIF(Fields!CATEGORY_ID.Value <> 12 AND Fields!CATEGORY_ID.Value <> 30
AND Fields!Q3MM_TOTAL_CNT.Value = 0,3,
IIF(Fields!CATEGORY_ID.Value <> 12 AND Fields!CATEGORY_ID.Value <> 30
AND Fields!TrendCalculation1.Value - Fields!TrendCalculation2.Value <
0.00,1,
IIF(Fields!CATEGORY_ID.Value <> 12 AND Fields!CATEGORY_ID.Value <> 30
AND Fields!TrendCalculation1.Value - Fields!TrendCalculation2.Value >
0.00,3,
IIF(Fields!CATEGORY_ID.Value <> 12 AND Fields!CATEGORY_ID.Value <> 30
AND Fields!TrendCalculation1.Value - Fields!TrendCalculation2.Value =
0.00,2,
5)))))))))))))))))))

look through array assign values into cells

I have 2d-array like this below. There are 26 values from 1-26, but also "bigger" categories, e.g. 2nd value: "Important", "very important", "extremely important", and are all classified as "check".
Can I integrate this into this Array like adding after "important: priority (1,3)? Or (2,3) Sorry, I am starting with Arrays.... I do not fully understand this yet. The values I then want to populate into columns. Example, if column = 1 then column2 = "Important" and column3 = "check" and so on.
Dim Priority(1 To 26, 1 To 2)
Priority(1, 1) = 1: Priority(1, 2) = "Important"
For Each Zelle In Range(Cells(FirstRow + 2, 14), Cells(LastRow, 14))
Zelle.Offset(0, 1) = Application.VLookup(Zelle, Priority, 3, False)
Zelle.Value = Application.VLookup(Zelle, Priority, 2, False)
'Zelle = IIf(IsError(Zelle), "???", "Zelle")
Next Zelle
it checks CellX and then goes directly to CellX+1 and so on...I am sure this could be also done with For i Loop