I have a dataframe as shown below
ID Price Duration
1 100 60
2 200 2
3 1 366
4 1 365
I would like to create a flag column based on condition in Price column and Duration column.
Steps:
If Price is less than 20 flag it as False else flag it as True
If Duration is less than 30 flag it as False else flag it as True
Expected Output:
ID Price Duration Price_Flag Duration_Flag
1 100 60 True True
2 200 2 True False
3 1 366 False True
4 10 365 False True
One idea is compare by list by order of columns name ['Price','Duration'] by DataFrame.gt:
df[['Price_Flag','Duration_Flag']] = df[['Price','Duration']].gt([20,30])
Or use Series.gt for each column separately:
df['Price_Flag'] = df['Price'].gt(20)
df['Duration_Flag'] = df['Duration'].gt(30)
Or use DataFrame.assign:
df = df.assign(Price_Flag = df['Price'].gt(20),
Duration_Flag = df['Duration'].gt(30))
print (df)
ID Price Duration Price_Flag Duration_Flag
0 1 100 60 True True
1 2 200 2 True False
2 3 1 366 False True
3 4 1 365 False True
Related
I have a postgres table that looks like this:
A B
5 4
10 10
13 15
100 250
20 Null
Using SQL, I would like to check whether the value in column A is larger than the value in column B and if so, then add a 1 to the column True. If the value in column A is smaller or equal to the value in column B or if column B contains a [NULL] value, I would like to add a 1 to the column False, like so:
A B True False
5 4 1 0
10 10 0 1
13 15 0 1
100 25 1 0
20 [NULL] 0 1
What is the best way to achieve this?
You can use case logic:
select t.*,
(case when A > B then 1 else 0 end) as true_col,
(case when A > B then 0 else 1 end) as false_col
from t;
I have a list of conditions to be run on the dataset to sort huge data.
df = A Huge_dataframe.
eg.
Index D1 D2 D3 D5 D6
0 8 5 0 False True
1 45 35 0 True False
2 35 10 1 False True
3 40 5 2 True False
4 12 10 5 False False
5 18 15 13 False True
6 25 15 5 True False
7 35 10 11 False True
8 95 50 0 False False
I have to sort above df based on given orders:
orders = [[A, B],[D, ~E, B], [~C, ~A], [~C, A]...]
#(where A, B, C , D, E are the conditions)
eg.
A = df['D1'].le(50)
B = df['D2'].ge(5)
C = df['D3'].ne(0)
D = df['D1'].ne(False)
E = df['D1'].ne(True)
# In the real scenario, I have 64 such conditions to be run on 5 million records.
eg.
I have to run all these conditions to get the resultant output.
What is the easiest way to achieve the following task, to order them using for loop or map or .apply?
df = df.loc[A & B]
df = df.loc[D & ~E & B]
df = df.loc[~C & ~A]
df = df.loc[~C & A]
Resultant df would be my expected output.
Here I am more interested in knowing, how would you use loop or map or .apply, If I want to run multiple conditions which are stored in a list. Not the resultant output.
such as:
for i in orders:
df = df[all(i)] # I am not able to implement this logic for each order
You are looking for bitwise and all the elements inside orders. In which case:
df = df[np.concatenate(orders).all(0)]
Have a Pandas Dataframe like below.
EventOccurrence Month
1 4
1 5
1 6
1 9
1 10
1 12
Need to add a identifier column to above panda's dataframe such that whenever Month is consecutive thrice a value of True is filled, else false. Explored few options like shift and window without luck. Any pointer is appreciated.
EventOccurrence Month Flag
1 4 F
1 5 F
1 6 T
1 9 F
1 10 F
1 12 F
Thank You.
You can check whether the diff between rows is one, and the diff shifted by 1 is one as well:
df['Flag'] = (df.Month.diff() == 1) & (df.Month.diff().shift() == 1)
EventOccurrence Month Flag
0 1 4 False
1 1 5 False
2 1 6 True
3 1 9 False
4 1 10 False
5 1 12 False
Note that this will also return True if it is consecutive > 3 times, but that behaviour wasn't specified in the question so I'll assume it's OK
If it needs to only flag the third one, and not for example the fourth consecutive instance, you could add a condition:
df['Flag'] = (df.Month.diff() == 1) & (df.Month.diff().shift() == 1) & (df.Month.diff().shift(2) !=1)
I am trying to randomly select values from a short list, sum them, then iterate that process many times. So far I think I am successful in generating the values:
Work so far
0.4855981 0 0 FALSE
Price 0.337666609 0 0 FALSE
1 $29.74 0.808816075 0 0 FALSE
2 $0.85 0.906751484 0 0 FALSE
3 $38.24 0.10572928 1 4 17
4 $17 0.957321497 0 0 FALSE
5 $25.50 0.644195743 0 0 FALSE
6 $18.70 0.133302328 0 0 FALSE
7 $29.75 0.907771163 0 0 FALSE
8 $14.45 0.156311546 0 0 FALSE
9 $30.60 0.871958447 0 0 FALSE
10 $26.34 0.0790938 1 14 24.65
11 $11.05 0.696383544 0 0 FALSE
12 $124.95 0.080728462 1 3 38.24
13 $9.35 0.03717127 1 10 26.34
14 $24.65 0.970430159 0 0 FALSE
15 $41.65 0.814402286 0 0 FALSE
0.462967917 0 0 FALSE
0.646432058 0 0 FALSE
0.49384003 0 0 FALSE
0.381349746 0 0 FALSE
0.129594937 0 0 FALSE
0.576582174 0 0 FALSE
0.37483142 0 0 FALSE
Total 106.23
In any set there are 24 attempts at selecting an item from the price list. What I have done is randomly generate a number and if it less than 0.125 (1/8 chance of getting a value from the price list per attempt) then I generate a random number between 1 and 15 and then vlookup to get the price.
However I want to iterate this process many times, so say out of 100 times each consisting of 24 attempts, what is the average value I return. I cannot find a way to simply add the number to itself each time I update the random numbers, and my VBA is pretty limited - I was considering a loop that has a clickbutton to refresh the numbers. Pseudo code since I know very little VBA:
for 1=1:100
clickbutton() #to refresh
grandtotal=grandtotal+total
end
averagevalue=grandtotal/i
I know it seems really easy, but I have not had luck searching how refresh with the clickbutton, or if that is even the best way. Thanks!
Pseudo code is a good place to start! If you name the total cell in your sheet to "Total", you should be able to run the below, which is my take on your pseudo code:
noRuns = 100
grandTotal = 0
For i = 1 to noRuns
ActiveSheet.Calculate
grandTotal = grandTotal + Range("Total").Value
Next i
MsgBox "The average value is " & grandTotal / noRuns
If that doesn't work, have a play with it and let me know how you get on.
It seems dataframe.le doesn't operate column wise fashion.
df = DataFrame(randn(8,12))
series=Series(rand(8))
df.le(series)
I would expect for each column in df it will compare to series (so total 12 columns comparison with series, so 12 column*8 row comparison involved). But it appears for each element in df it will compare against every elements in series so this will involves 12(columns)*8(rows) * 8(elements in series) comparison. How can I achieve column by column comparison?
Second question is once I am done with column wise comparison I want to be able to count for each row how many 'true' are there, I am currently doing astype(int32) to turn bool into int then do sum, does this sound reasonable?
Let me give an example about the first question to show what I meant (using a simpler example since show 8*12 is tough):
>>>from pandas import *
>>>from numpy.random import *
>>>df = DataFrame(randn(2,5))
>>>t = DataFrame(randn(2,1))
>>>df
0 1 2 3 4
0 -0.090283 1.656517 -0.183132 0.904454 0.157861
1 1.667520 -1.242351 0.379831 0.672118 -0.290858
>>>t
0
0 1.291535
1 0.151702
>>>df.le(t)
0 1 2 3 4
0 True False False False False
1 False False False False False
What I expect df's column 1 should be:
1
False
True
Because 1.656517 < 1.291535 is False and -1.242351 < 0.151702 is True, this is column wise comparison. However the print out is False False.
I'm not sure I understand the first part of your question, but as to the second part, you can count the Trues in a boolean DataFrame using sum:
In [11]: df.le(s).sum(axis=0)
Out[11]:
0 4
1 3
2 7
3 3
4 6
5 6
6 7
7 6
8 0
9 0
10 0
11 0
dtype: int64
.
Essentially le is testing for each column:
In [21]: df[0] < s
Out[21]:
0 False
1 True
2 False
3 False
4 True
5 True
6 True
7 True
dtype: bool
Which for each index is testing:
In [22]: df[0].loc[0] < s.loc[0]
Out[22]: False