How to apply regular function to df third column? - pandas

I have df that currently has two columns df[['sys1','dia1']]
I created this function (the parameters are d= df['dia1'] and s = df['sys1'] :
Now i am trying to create a third column by using this function. It would look like this:
df['FirstVisitStge'] = df['FirstVisitStge'].apply(classify)
I am getting an error. I even tried using predefined parameters in the function and still getting an error. What am i doing wrong?
def classify(d,s):
if (d>=90 & d<100 & s<160) or (s >= 140 & s < 160 & d < 100):
return 'Stage 1'
elif (s >= 160 & s <180 & d <110) or (d >= 100 and d < 110 and s > 180):
return 'Stage 2'
elif s >= 180 or d >= 110:
return 'hypertensive crisis'
else:
return 'NA'

Related

No method matching error when working with Vector{Int64]

I have the following code where firstly I add the values for each index from two columns and creating Vector{Int64}
df = CSV.read(joinpath("data", "data.csv"), DataFrame)
adding_columns = df.firstcolumn + df.secondcolumn
Then I will create a function as following:
function fnct(data::Vector{T}; var= 8) where { T <: Number }
V = []
for x in 1:size(data)[1]
strt = x-var
ending = x+var
avg = 0
if strt < 1
for y in 1:x+var
avg = avg+data[y]
end
avg = avg/(x+var-1)
elseif ending > size(data)[1]
for y in x-var:size(data)[1]
avg = avg+data[y]
end
avg = avg/(size(data)-x-var)
else
for y in x-var:x+var
avg = avg+data[y]
end
avg = avg/(2*var)
end
push!(V,avg)
end
return V
end
When trying:
typeof(adding_columns)
I will get:
Vector{Int64}
however when calling
fnct(adding_columns)
I will get:
ERROR: MethodError: no method matching -(::Tuple{Int64}, ::Int64)
I presume that it takes my adding_columns as Tuple but I do not get it why, when the typeof is giving me Vector.
How could I solve this problem?
size(data) is a tuple:
julia> size([1,2,3]::Vector{Int})
(3,)
...but you're subtracting an integer from it in avg = avg/(size(data)-x-var).
Did you mean avg = avg/(length(data)-x-var) or avg = avg/(size(data, 1)-x-var)?

if elif(conditions) in Jython FDMEE keeps giving wrong result

it shall be a simple if elif(conditions) in jython, but it seems like Jython in FDMEE keeps checking for wrong result in the condition.
def Timetest(strField, strRecord):
import java.util.Date as date
import java.text.SimpleDateFormat as Sdf
import java.lang.Exception as Ex
import java.sql as sql
import java.util.concurrent.TimeUnit as TimeUnit
PerKey = fdmContext["PERIODKEY"]
strDate = strRecord.split(",")[11]
#get maturity date
strMM = strDate.split("/")[0]
strDD = strDate.split("/")[1]
strYYYY = strDate.split("/")[2]
strDate = ("%s.%s.%s" % (strMM,strDD, strYYYY))
#converting Maturity date
sdf = Sdf("MM.dd.yyyy")
strMRD = sdf.parse(strDate)
#calc date diff
diff = (strMRD.getTime()- PerKey.getTime())/86400000
diff = ("%d" % diff)
if diff>="0":
if diff <= "30":
return "Mat_Up1m " + diff
elif diff <= "90":
return "Mat_1to3m " + diff #the result goes here all the time although my diff is 367
elif diff <= "360":
return "Mat_3to12m " + diff
elif diff <= "1800":
return "Mat_1to5y " + diff #the result supposed to go here
else:
return "Mat_Over5y "+ diff
Not sure why it keeps going to the second elif instead of the fourth elif.
My calculation result of diff = 367
any idea on how to make sure that my code read the correct if elif condition?
Hope you have already figured it out, If not then check the mistakes you have made in your script.
In your script you're comparing string "367" with string values "0","90" etc., its not integer comparison, string "367" is always less than string "90" so its going into that elif condition.
Now you have to do integer comparison instead of string comparison and also move your all elif conditions inside the main if condition.
In the return statement you have to convert the interger to string to concatinate diff to a string.
Check the following code, with all the changes.
if diff>=0:
if diff <= 30:
return "Mat_Up1m " + str(diff)
elif diff <= 90:
return "Mat_1to3m " + str(diff)
elif diff <= 360:
return "Mat_3to12m " + str(diff)
elif diff <= 1800:
return "Mat_1to5y " + str(diff)
else:
return "Mat_Over5y "+ str(diff)

How to Plot a function of two variables in Julia with pyplot

I'm trying to plot a function of two variables with pyplot in Julia. The working starting-point is the following (found here at StackOverflow):
function f(z,t)
return z*t
end
z = linspace(0,5,11)
t = linspace(0,40,4)
for tval in t
plot(z, f(z, tval))
end
show()
This works right for me and is giving me exactly what I wanted:
a field of lines.
My own functions are as follows:
## needed functions ##
const gamma_0 = 6
const Ksch = 1.2
const Kver = 1.5
function Kvc(vc)
if vc <= 0
return 0
elseif vc < 20
return (100/vc)^0.1
elseif vc < 100
return 2.023/(vc^0.153)
elseif vc == 100
return 1
elseif vc > 100
return 1.380/(vc^0.07)
else
return 0
end
end
function Kgamma(gamma_t)
return 1-((gamma_t-gamma_0)/100)
end
function K(gamma_t, vc)
return Kvc(vc)*Kgamma(gamma_t)*Ksch*Kver
end
I've tried to plot them as follows:
i = linspace(0,45,10)
j = linspace(0,200,10)
for i_val in i
plot(i,K(i,j))
end
This gives me the following Error:
isless has no method matching isless(::Int64, ::Array{Float64,1})
while loading In[51], in expression starting on line 3
in Kvc at In[17]:2 in anonymous at no file:4
Obviously, my function cant deal with an array.
Next try:
i = linspace(0,200,11)
j = linspace(0,45,11)
for i_val in i
plot(i_val,map(K,i_val,j))
end
gives me a empty plot only with axes
Can anybody please give me a hint...
EDIT
A simpler example:
using PyPlot
function P(n,M)
return (M*n^3)/9550
end
M = linspace(1,5,5)
n = linspace(0,3000,3001)
for M_val in M
plot(n,P(n,M_val))
end
show()
Solution
OK, with your help I found this solution for the shortened example which works for me as intended:
function P(n,M)
result = Array(Float64, length(n))
for (idx, val) in enumerate(n)
result[idx] = (M*val^3)/9550
end
return result
end
n = linspace(0,3000,3001)
for M_val = 1:5
plot(n,P(n,M_val))
end
show()
This gives me what I wanted for this shortened example. The remainig question is: could it be done in a simpler more elegant way?
I'll try to apply it to the original example and post it when I'll succed.
I don't completely follow all the details of what you're trying to accomplish, but here are examples on how you can modify a couple of your functions so that they accept and return arrays:
function Kvc(vc)
result = Array(Float64, length(vc))
for (idx, val) in enumerate(vc)
if val <= 0
result[idx] = 0
elseif val < 20
result[idx] = (100/val)^0.1
elseif val < 100
result[idx] = 2.023/(val^0.153)
elseif val == 100
result[idx] = 1
elseif val > 100
result[idx] = 1.380/(val^0.07)
else
result[idx] = 0
end
end
return result
end
function Kgamma(gamma_t)
return ones(length(gamma_t))-((gamma_t - gamma_0)/100)
end
Also, for your loop, I think you probably want something like:
for i_val in i
plot(i_val,K(i_val,j))
end
rather than plot(i, K(i,j), as that would just print the same thing over and over.
< is defined for scalars. I think you need to broadcast it for arrays, i.e. use .<. Example:
julia> x = 2
2
julia> x < 3
true
julia> x < [3 4]
ERROR: MethodError: no method matching isless(::Int64, ::Array{Int64,2})
Closest candidates are:
isless(::Real, ::AbstractFloat)
isless(::Real, ::Real)
isless(::Integer, ::Char)
in <(::Int64, ::Array{Int64,2}) at .\operators.jl:54
in eval(::Module, ::Any) at .\boot.jl:234
in macro expansion at .\REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at .\event.jl:46
julia> x .< [3 4]
1x2 BitArray{2}:
true true

Looping through pandas data frame and creating new column value

I'm trying to loop through a csv file which I converted into a pandas data frame.
I need to loop through each line and check the latitude and longitude data I have (2 separate columns) and append a code (0,1 or 2) to the the same line depending on whether the lat, long data falls within a certain range.
I'm somewhat new to python and would love any help ya'll might have.
It's throwing off quite a few errors at me.
book = 'yellow_tripdata_2014-04.csv'
write_book = 'yellow_04.csv'
yank_max_long = -73.921630300
yank_min_long = -73.931169700
yank_max_lat = 40.832823000
yank_min_lat = 40.825582000
mets_max_long = 40.760523000
mets_min_long = 40.753277000
mets_max_lat = -73.841035400
mets_min_lat = -73.850564600
df = pd.read_csv(book)
##To check for Yankee Stadium Lat's and Long's, if within gps units then Stadium_Code = 1 , if mets then Stadium_Code=2
df['Stadium_Code'] = 0
for i, row in df.iterrows():
if yank_min_lat <= float(row['dropoff_latitude']) <= yank_max_lat and yank_min_long <=float(row('dropoff_longitude')) <=yank_max_long:
row['Stadium_Code'] == 1
elif mets_min_lat <= float(row['dropoff_latitude']) <= mets_max_lat and mets_min_long <=float(row('dropoff_longitude')) <=mets_max_long:
row['Stadium_Code'] == 2
I tried using the .loc command but ran into this error message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-33-9a9166772646> in <module>()
----> 1 yank_mask = (df['dropoff_latitude'] > yank_min_lat) & (df['dropoff_latitude'] <= yank_max_lat) & (df['dropoff_longitude'] > yank_min_long) & (df['dropoff_longitude'] <= yank_max_long)
2
3 mets_mask = (df['dropoff_latitude'] > mets_min_lat) & (df['dropoff_latitude'] <= mets_max_lat) & (df['dropoff_longitude'] > mets_min_long) & (df['dropoff_longitude'] <= mets_max_long)
4
5 df.loc[yank_mask, 'Stadium_Code'] = 1
/Users/benjaminprice/anaconda/lib/python3.4/site-packages/pandas/core/frame.py in __getitem__(self, key)
1795 return self._getitem_multilevel(key)
1796 else:
-> 1797 return self._getitem_column(key)
1798
1799 def _getitem_column(self, key):
/Users/benjaminprice/anaconda/lib/python3.4/site-packages/pandas/core/frame.py in _getitem_column(self, key)
1802 # get column
1803 if self.columns.is_unique:
-> 1804 return self._get_item_cache(key)
1805
1806 # duplicate columns & possible reduce dimensionaility
/Users/benjaminprice/anaconda/lib/python3.4/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
1082 res = cache.get(item)
1083 if res is None:
-> 1084 values = self._data.get(item)
1085 res = self._box_item_values(item, values)
1086 cache[item] = res
/Users/benjaminprice/anaconda/lib/python3.4/site-packages/pandas/core/internals.py in get(self, item, fastpath)
2849
2850 if not isnull(item):
-> 2851 loc = self.items.get_loc(item)
2852 else:
2853 indexer = np.arange(len(self.items))[isnull(self.items)]
/Users/benjaminprice/anaconda/lib/python3.4/site-packages/pandas/core/index.py in get_loc(self, key, method)
1570 """
1571 if method is None:
-> 1572 return self._engine.get_loc(_values_from_object(key))
1573
1574 indexer = self.get_indexer([key], method=method)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)()
KeyError: 'dropoff_latitude'
I'm usually not too bad at figuring out what these error codes mean, but this one threw me off.
Firstly it's pretty wasteful to iterate row-wise when there are vectorised solutions available that will operate on the whole df at once.
I'd create a boolean mask of your 2 conditions and pass these to .loc to mask the rows that meet the criteria and set these to the values.
Here the masks use the bitwise operators & to and the conditions and parentheses are used around each condition due to operator precedence.
So the following should work:
yank_mask = (df['dropoff_latitude'] > yank_min_lat) & (df['dropoff_latitude'] <= yank_max_lat) & (df['dropoff_longitude'] > yank_min_long) & (df['dropoff_longitude'] <= yank_max_long)
mets_mask = (df['dropoff_latitude'] > mets_min_lat) & (df['dropoff_latitude'] <= mets_max_lat) & (df['dropoff_longitude'] > mets_min_long) & (df['dropoff_longitude'] <= mets_max_long)
df.loc[yank_mask, 'Stadium_Code'] = 1
df.loc[mets_mask, 'Stadium_Code'] = 2
If not already done so I'd read the docs as will aid you in understanding how the above works

Output/Print variable, in code, in if statement

I am trying to achieve the following:
The user is shown an excel spread sheet with a list of assumption which they can change.
Title | Value |
Input01 | 10 | =
Input02 | 2 | >=
Input03 | 800 | >=
Input04 | 4 | >=
Input05 | 2 | <=
There is an If .. Then Statement that pulls in data if the assumption are met. However if an assumption is blanc, it should not be included in the If .. Then Statement.
If x = Input01Value And y >= Input02Value _
And z >= Input03Value And a >= Input04Value _
And b <= Input05Value Then
User ommits Input03
If x = Input01Value And y >= Input02Value _
And a >= Input04Value And b <= Input05Value Then
Now I could check to see if each value exist, and then follow it by another If statement with the appropriate variables. But this seems a bit redundant.
I was wondering if something like the following is possible:
Input 01 = ""
If Input01Value != "" Then Input01 = "x = " & Input01Value
'Then use join or something similar to join all of them ..
And Then use this Input01 directly in the If .. Then statement. This way when a variable is empty the And .. are not included and the If statement will not fail.
Eg. (I know this doesn't work, just illustrating the scenario)
VBA: If Input01 Then
Result while compiling: If x = Input01Value Then
Please Note, I know I could do something like the following:
If Boolean And Variable2 > 4 Then and then have Boolean and Variable2 populate with a value in the cell, however the issue with this is that if the user, for example, decides to omit the Variable2 (which is reasonable) it will fail. eg. If (Boolean = True) And > 4 Then.
Hope my question is clear, thanks for the help.
What about using a function which operates on a select case depending on a string operator and two values?
Function conditionalString(condition As String, x As Variant, y As Variant) As Boolean
Select Case condition
Case "="
If (x = y) Then
conditionalString = True
Else
conditionalString = False
End If
Exit Function
Case ">="
conditionalString = (x >= y)
Exit Function
Case "<="
conditionalString = (x <= y)
Exit Function
Case ">"
conditionalString = (x > y)
Exit Function
Case "<"
conditionalString = (x < y)
Exit Function
Case Else
conditionalString = False
End Select
End Function
You could then just have another function, say "check if value isn't blank" before calling all assumptions.
Expanding on my comment, you can use something like this to test each row of input.
Function TestIt(testValue,inputOperator,inputValue) As Boolean
If Len(inputValue)=0 Then
TestIt=True 'ignore this test: no value supplied
Else
TestIt=Application.Evaluate(testValue & inputOperator & inputValue)
End If
End function