How to convert variable type "SentinelArrays" to "Arrays{Float64,n}" in julia - variables

I am trying to optimize my Julia code by making it type-stable. Hence, I tried to declare the variable types in the function headers. But one of the variables has a type of ::SentinelArrays.ChainedVector{Float64,Array{Float64,1}} as shown in the code snippet below.
The code example:
df=CSV.read("text.csv", DataFrame)
a = view(df, :, 1)
#this has a type of ::SentinelArrays.ChainedVector{Float64,Array{Float64,1}}
b = view(df, :, 2:4)
#while type of this is ::Arrays{Float64,2}
#I would like to pass the type of the arrays in the function.
function calc(a, b::Arrays{Float64,2})
a+b
end
I tried passing the typeof(a) in the function calc(a::SentinelArrays.ChainedVector{Float64,Array{Float64,1}}, b::Arrays{Float64,2}) however, this throws an error of no method matching.
May I know the correct way to assign this type or maybe if can convert this to normal Array{Float64,1}.
Please suggest a solution to this issue. Thanks in advance.

You can just write Array(a) where a is your SentinelArray as here:
julia> u = SentinelArray(rand(1:8,4))
4-element SentinelVector{Int64, Int64, Missing, Vector{Int64}}:
2
3
5
3
julia> Array(u)
4-element Vector{Union{Missing, Int64}}:
2
3
5
3
However, normally you would just make the function signature to be something like:
function calc(a, b::AbstractArray{T,2}) where T
because this would work with both those types:
julia> SentinelMatrix{Int64, Int64, Missing, Matrix{Int64}} <: AbstractArray{T,2} where T
true

Related

How to create an iteration for a df

How to create a loop for these statements by incrementing 0 one by one up to 25.(including the incrementation of all the df parameters eg.ES_0_BME680_Temp to ES_1_BME680_TEMP etc up to 25) and produce output for all the calculations.
df['0_680ph20']=611.2*np.exp((17.625*df[['ES_0_BME680_TEMP']])/(243.12+df[['ES_0_BME680_TEMP']]))
df['0_680aH']=(df['ES_0_BME680_RH'] /100)*(df['0_680ph20']/(461.52*(df['ES_0_BME680_TEMP']+273.15)))*1000
df['0_680LN']=np.log(((df['0_680aH']/1000)*461.52*(df['ES_0_BME680_TEMP']+273.15))/(0.5*611.2))
df['0_680T_tar']=(df['0_680LN']*243.12)/(17.625-df['0_680LN'])
df['0_688ph20']=611.2*np.exp((17.625*df[['ES_0_BME688_TEMP']])/(243.12+df[['ES_0_BME688_TEMP']]))
df['0_688aH']=(df['ES_0_BME688_RH'] /100)*(df['0_688ph20']/(461.52*(df['ES_0_BME688_TEMP']+273.15)))*1000
df['0_688LN']=np.log(((df['0_688aH']/1000)*461.52*(df['ES_0_BME688_TEMP']+273.15))/(0.5*611.2))
df['0_688T_tar']=(df['0_688LN']*243.12)/(17.625-df['0_688LN'])
thank you.
you could do a for loop, using f-strings to create your string
something like:
for n in range(26):
df[f'{n}_680ph20']=611.2*np.exp((17.625*df[[f'ES_{n}_BME680_TEMP']])/(243.12+df[[f'ES_{n}_BME680_TEMP']]))
df[f'{n}_680aH']=(df[f'ES_{n}_BME680_RH'] /100)*(df[f'{n}_680ph20']/(461.52*(df[f'ES_{n}_BME680_TEMP']+273.15)))*1000
df[f'{n}_680LN']=np.log(((df[f'{n}_680aH']/1000)*461.52*(df[f'ES_{n}_BME680_TEMP']+273.15))/(0.5*611.2))
df[f'{n}_680T_tar']=(df[f'{n}_680LN']*243.12)/(17.625-df[f'{n}_680LN'])
df[f'{n}_688ph20']=611.2*np.exp((17.625*df[[f'ES_{n}_BME688_TEMP']])/(243.12+df[[f'ES_{n}_BME688_TEMP']]))
df[f'{n}_688aH']=(df[f'ES_{n}_BME688_RH'] /100)*(df[f'{n}_688ph20']/(461.52*(df[f'ES_{n}_BME688_TEMP']+273.15)))*1000
df[f'{n}_688LN']=np.log(((df[f'{n}_688aH']/1000)*461.52*(df[f'ES_{n}_BME688_TEMP']+273.15))/(0.5*611.2))
df[f'{n}_688T_tar']=(df[f'{n}_688LN']*243.12)/(17.625-df['0_688LN'])
Also, I see that you are doing the same 4 operations for two different digits. You could also create a function to do that, something like
def expandata(df, digits, nitems):
for n in range(nitems+1):
df[f'{n}_{digits}ph20']=611.2*np.exp((17.625*df[[f'ES_{n}_BME{digits}_TEMP']])/(243.12+df[[f'ES_{n}_BME{digits}_TEMP']]))
df[f'{n}_{digits}aH']=(df[f'ES_{n}_BME{digits}_RH'] /100)*(df[f'{n}_{digits}ph20']/(461.52*(df[f'ES_{n}_BME{digits}_TEMP']+273.15)))*1000
df[f'{n}_{digits}LN']=np.log(((df[f'{n}_{digits}aH']/1000)*461.52*(df[f'ES_{n}_BME{digits}_TEMP']+273.15))/(0.5*611.2))
df[f'{n}_{digits}T_tar']=(df[f'{n}_{digits}LN']*243.12)/(17.625-df[f'{n}_{digits}LN'])

[pandas]Dividing all elements of columns in df with elements in another column (Same df)

I'm sorry, I know this is basic but I've tried to figure it out myself for 2 days by sifting through documentation to no avail.
My code:
import numpy as np
import pandas as pd
name = ["bob","bobby","bombastic"]
age = [10,20,30]
price = [111,222,333]
share = [3,6,9]
list = [name,age,price,share]
list2 = np.transpose(list)
dftest = pd.DataFrame(list2, columns = ["name","age","price","share"])
print(dftest)
name age price share
0 bob 10 111 3
1 bobby 20 222 6
2 bombastic 30 333 9
Want to divide all elements in 'price' column with all elements in 'share' column. I've tried:
print(dftest[['price']/['share']]) - Failed
dftest['price']/dftest['share'] - Failed, unsupported operand type
dftest.loc[:,'price']/dftest.loc[:,'share'] - Failed
Wondering if I could just change everything to int or float, I tried:
dftest.astype(float) - cant convert from str to float
Ive tried iter and items methods but could not understand the printouts...
My only suspicion is to use something called iterate, which I am unable to wrap my head around despite reading other old posts...
Please help me T_T
Apologies in advance for the somewhat protracted answer, but the question is somewhat unclear with regards to what exactly you're attempting to accomplish.
If you simply want price[0]/share[0], price[1]/share[1], etc. you can just do:
dftest['price_div_share'] = dftest['price'] / dftest['share']
The issue with the operand types can be solved by:
dftest['price_div_share'] = dftest['price'].astype(float) / dftest['share'].astype(float)
You're getting the cant convert from str to float error because you're trying to call astype(float) on the ENTIRE dataframe which contains string columns.
If you want to divide each item by each item, i.e. price[0] / share[0], price[1] / share[0], price[2] / share[0], price[0] / share[1], etc. You would need to iterate through each item and append the result to a new list. You can do that pretty easily with a for loop, although it may take some time if you're working with a large dataset. It would look something like this if you simply want the result:
new_list = []
for p in dftest['price'].astype(float):
for s in dftest['share'].astype(float):
new_list.append(p/s)
If you want to get this in a new dataframe you can simply save it to a new dataframe using pd.Dataframe() method:
new_df = pd.Dataframe(new_list, columns=[price_divided_by_share])
This new dataframe would only have one column (the result, as mentioned above). If you want the information from the original dataframe as well, then you would do something like the following:
new_list = []
for n, a, p in zip(dftest['name'], dftest['age'], dftest['price'].astype(float):
for s in dftest['share'].astype(float):
new_list.append([n, a, p, s, p/s])
new_df = pd.Dataframe(new_list, columns=[name, age, price, share, price_div_by_share])
If you check the data types of your dataframe, you will realise that they are all strings/object type :
dftest.dtypes
name object
age object
price object
share object
dtype: object
first step will be to change the relevant columns to numbers - this is one way:
dftest = dftest.set_index("name").astype(float)
dftest.dtypes
age float64
price float64
share float64
dtype: object
This way you make the names a useful index, and separate it from the numeric data. This is just a suggestion; you may have other reasons to leave names as a columns - in that case, you have to individually change the data types of each column.
Once that is done, you can safely execute your code :
dftest.div(dftest.share,axis=0)
age price share
name
bob 3.333333 37.0 1.0
bobby 3.333333 37.0 1.0
bombastic 3.333333 37.0 1.0
I assume this is what you expect as your outcome. If not, you can tweak it. Main part is get your data types as numbers before computation/division can occur.

AttributeError: 'int' object has no attribute 'count' while using itertuples() method with dataframes

I am trying to iterate over rows in a Pandas Dataframe using the itertuples()-method, which works quite fine for my case. Now i want to check if a specific value ('x') is in a specific tuple. I used the count() method for that, as i need to use the number of occurences of x later.
The weird part is, for some Tuples that works just fine (i.e. in my case (namedtuple[7].count('x')) + (namedtuple[8].count('x')) ), but for some (i.e. namedtuple[9].count('x')) i get an AttributeError: 'int' object has no attribute 'count'
Would appreciate your help very much!
Apparently, some columns of your DataFrame are of object type (actually a string)
and some of them are of int type (more generally - numbers).
To count occurrences of x in each row, you should:
Apply a function to each row which:
checks whether the type of the current element is str,
if it is, return count('x'),
if not, return 0 (don't attempt to look for x in a number).
So far this function returns a Series, with a number of x in each column
(separately), so to compute the total for the whole row, this Series should
be summed.
Example of working code:
Test DataFrame:
C1 C2 C3
0 axxv bxy 10
1 vx cy 20
2 vv vx 30
Code:
for ind, row in df.iterrows():
print(ind, row.apply(lambda it:
it.count('x') if type(it).__name__ == 'str' else 0).sum())
(in my opinion, iterrows is more convenient here).
The result is:
0 3
1 1
2 1
So as you can see, it is possible to count occurrences of x,
even when some columns are not strings.

Octave keyboard input function to filter concatenated string and integer?

if we write 12wkd3, how to choose/filter 123 as integer in octave?
example in octave:
A = input("A?\n")
A?
12wkd3
A = 123
while 12wkd3 is user keyboard input and A = 123 is the expected answer.
assuming that the general form you're looking for is taking an arbitrary string from the user input, removing anything non-numeric, and storing the result it as an integer:
A = input("A? /n",'s');
A = int32(str2num(A(isdigit(A))));
example output:
A?
324bhtk.p89u34
A = 3248934
to clarify what's written above:
in the input statement, the 's' argument causes the answer to get stored as a string, otherwise it's evaluated by Octave first. most inputs would produce errors, others may be interpreted as functions or variables.
isdigit(A) produces a logical array of values for A with a 1 for any character that is a 0-9 number, and 0 otherwise.
isdigit('a1 3 b.') = [0 1 0 1 0 0 0]
A(isdigit(A)) will produce a substring from A using only those values corresponding to a 1 in the logical array above.
A(isdigit(A)) = 13
that still returns a string, so you need to convert it into a number using str2num(). that, however, outputs a double precision number. so finally to get it to be an integer you can use int32()

Using CONTAINS with variables sql

Ok so I am trying to reference one variable with another in SQL.
X= a,b,c,d (x is a string variable with a list of things in it)
Y= b ( Y is a string variable that may or may not have a vaue that appears in X)
I tried this:
Case when Y in (X) then 1 else 0 end as aa
But it doesnt work since it looks for exact matches between X and Y
also tried this:
where contains(X,#Y)
but i cant create Y globally since it is a variable that changes in each row of the table.( x also changes)
A solution in SAS would also be useful.
Thanks
Maybe like will help
select
*
from
t
where
X like ('%'+Y+'%')
or
select
case when (X like ('%'+Y+'%')) then 1 else 0 end
from
t
SQLFiddle example
In SAS I would use the INDEX function, either in a data step or proc sql. This returns the position within the string in which it finds the character(s), or zero if there is no match. Therefore a test if the value returned is greater than zero will result in a binary 1:0 output. You need to use the compress function with the variable containing the search characters as SAS pads the value with blanks.
Data step solution :
aa=index(x,compress(y))>0;
Proc Sql solution :
index(x,compress(y))>0 as aa