How to create an iteration for a df

How to create an iteration for a df - pandas

How to create a loop for these statements by incrementing 0 one by one up to 25.(including the incrementation of all the df parameters eg.ES_0_BME680_Temp to ES_1_BME680_TEMP etc up to 25) and produce output for all the calculations.
df['0_680ph20']=611.2*np.exp((17.625*df[['ES_0_BME680_TEMP']])/(243.12+df[['ES_0_BME680_TEMP']]))
df['0_680aH']=(df['ES_0_BME680_RH'] /100)*(df['0_680ph20']/(461.52*(df['ES_0_BME680_TEMP']+273.15)))*1000
df['0_680LN']=np.log(((df['0_680aH']/1000)*461.52*(df['ES_0_BME680_TEMP']+273.15))/(0.5*611.2))
df['0_680T_tar']=(df['0_680LN']*243.12)/(17.625-df['0_680LN'])
df['0_688ph20']=611.2*np.exp((17.625*df[['ES_0_BME688_TEMP']])/(243.12+df[['ES_0_BME688_TEMP']]))
df['0_688aH']=(df['ES_0_BME688_RH'] /100)*(df['0_688ph20']/(461.52*(df['ES_0_BME688_TEMP']+273.15)))*1000
df['0_688LN']=np.log(((df['0_688aH']/1000)*461.52*(df['ES_0_BME688_TEMP']+273.15))/(0.5*611.2))
df['0_688T_tar']=(df['0_688LN']*243.12)/(17.625-df['0_688LN'])
thank you.

you could do a for loop, using f-strings to create your string
something like:
for n in range(26):
df[f'{n}_680ph20']=611.2*np.exp((17.625*df[[f'ES_{n}_BME680_TEMP']])/(243.12+df[[f'ES_{n}_BME680_TEMP']]))
df[f'{n}_680aH']=(df[f'ES_{n}_BME680_RH'] /100)*(df[f'{n}_680ph20']/(461.52*(df[f'ES_{n}_BME680_TEMP']+273.15)))*1000
df[f'{n}_680LN']=np.log(((df[f'{n}_680aH']/1000)*461.52*(df[f'ES_{n}_BME680_TEMP']+273.15))/(0.5*611.2))
df[f'{n}_680T_tar']=(df[f'{n}_680LN']*243.12)/(17.625-df[f'{n}_680LN'])
df[f'{n}_688ph20']=611.2*np.exp((17.625*df[[f'ES_{n}_BME688_TEMP']])/(243.12+df[[f'ES_{n}_BME688_TEMP']]))
df[f'{n}_688aH']=(df[f'ES_{n}_BME688_RH'] /100)*(df[f'{n}_688ph20']/(461.52*(df[f'ES_{n}_BME688_TEMP']+273.15)))*1000
df[f'{n}_688LN']=np.log(((df[f'{n}_688aH']/1000)*461.52*(df[f'ES_{n}_BME688_TEMP']+273.15))/(0.5*611.2))
df[f'{n}_688T_tar']=(df[f'{n}_688LN']*243.12)/(17.625-df['0_688LN'])
Also, I see that you are doing the same 4 operations for two different digits. You could also create a function to do that, something like
def expandata(df, digits, nitems):
for n in range(nitems+1):
df[f'{n}_{digits}ph20']=611.2*np.exp((17.625*df[[f'ES_{n}_BME{digits}_TEMP']])/(243.12+df[[f'ES_{n}_BME{digits}_TEMP']]))
df[f'{n}_{digits}aH']=(df[f'ES_{n}_BME{digits}_RH'] /100)*(df[f'{n}_{digits}ph20']/(461.52*(df[f'ES_{n}_BME{digits}_TEMP']+273.15)))*1000
df[f'{n}_{digits}LN']=np.log(((df[f'{n}_{digits}aH']/1000)*461.52*(df[f'ES_{n}_BME{digits}_TEMP']+273.15))/(0.5*611.2))
df[f'{n}_{digits}T_tar']=(df[f'{n}_{digits}LN']*243.12)/(17.625-df[f'{n}_{digits}LN'])

Related

Pandas run function only on subset of whole Dataframe

Lets say i have Dataframe, which has 200 values, prices for products. I want to run some operation on this dataframe, like calculate average price for last 10 prices.
The way i understand it, right now pandas will go through every single row and calculate average for each row. Ie first 9 rows will be Nan, then from 10-200, it would calculate average for each row.
My issue is that i need to do a lot of these calculations and performance is an issue. For that reason, i would want to run the average only on say on last 10 values (dont need more) from all values, while i want to keep those values in the dataframe. Ie i dont want to get rid of those values or create new Dataframe.
I just essentially want to do calculation on less data, so it is faster.
Is something like that possible? Hopefully the question is clear.

Building off Chicodelarose's answer, you can achieve this in a more "pandas-like" syntax.
Defining your df as follows, we get 200 prices up to within [0, 1000).
df = pd.DataFrame((np.random.rand(200) * 1000.).round(decimals=2), columns=["price"])
The bit you're looking for, though, would the following:
def add10(n: float) -> float:
"""An exceptionally simple function to demonstrate you can set
values, too.
"""
return n + 10
df["price"].iloc[-12:] = df["price"].iloc[-12:].apply(add10)
Of course, you can also use these selections to return something else without setting values, too.
>>> df["price"].iloc[-12:].mean().round(decimals=2)
309.63 # this will, of course, be different as we're using random numbers
The primary justification for this approach lies in the use of pandas tooling. Say you want to operate over a subset of your data with multiple columns, you simply need to adjust your .apply(...) to contain an axis parameter, as follows: .apply(fn, axis=1).
This becomes much more readable the longer you spend in pandas. 🙂

Given a dataframe like the following:
Price
0 197.45
1 59.30
2 131.63
3 127.22
4 35.22
.. ...
195 73.05
196 47.73
197 107.58
198 162.31
199 195.02
[200 rows x 1 columns]
Call the following to obtain the mean over the last n rows of the dataframe:
def mean_over_n_last_rows(df, n, colname):
return df.iloc[-n:][colname].mean().round(decimals=2)
print(mean_over_n_last_rows(df, 2, "Price"))
Output:
178.67

How do I reverse each value in a column bit wise for a hex number?

I have a dataframe which has a column called hexa which has hex values like this. They are of dtype object.
hexa
0 00802259AA8D6204
1 00802259AA7F4504
2 00802259AA8D5A04
I would like to remove the first and last bits and reverse the values bitwise as follows:
hexa-rev
0 628DAA592280
1 457FAA592280
2 5A8DAA592280
Please help

I'll show you the complete solution up here and then explain its parts below:
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
list_of_bits = [i+j for i, j in zip(trimmed_bits[::2], trimmed_bits[1::2])]
reversed_bits = [list_of_bits[-i] for i in range(1,len(list_of_bits)+1)]
return ''.join(reversed_bits)
df['hexa-rev'] = df['hexa'].apply(lambda x: reverse_bits(x))
There are possibly a couple ways of doing it, but this way should solve your problem. The general strategy will be defining a function and then using the apply() method to apply it to all values in the column. It should look something like this:
df['hexa-rev'] = df['hexa'].apply(lambda x: reverse_bits(x))
Now we need to define the function we're going to apply to it. Breaking it down into its parts, we strip the first and last bit by indexing. Because of how negative indexes work, this will eliminate the first and last bit, regardless of the size. Your result is a list of characters that we will join together after processing.
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
The second line iterates through the list of characters, matches the first and second character of each bit together, and then concatenates them into a single string representing the bit.
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
list_of_bits = [i+j for i, j in zip(trimmed_bits[::2], trimmed_bits[1::2])]
The second to last line returns the list you just made in reverse order. Lastly, the function returns a single string of bits.
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
list_of_bits = [i+j for i, j in zip(trimmed_bits[::2], trimmed_bits[1::2])]
reversed_bits = [list_of_bits[-i] for i in range(1,len(list_of_bits)+1)]
return ''.join(reversed_bits)
I explained it in reverse order, but you want to define this function that you want applied to your column, and then use the apply() function to make it happen.

Finding the count of a set of substrings in pandas dataframe

I am given a set of substrings. I need to find the count of occurrence of all those substrings in a particular column in a dataframe. The relevant datframe would look like this
training['concat']
0 svAxu$paxArWAn
1 xvAxaSa$varRANi
2 AxAna$xurbale
3 go$BakwAH
4 viXi$Bexena
5 nIwi$kuSalaM
6 lafkA$upamam
7 yaSas$lipsoH
8 kaSa$AGAwam
9 hewumaw$uwwaram
10 varRa$pUgAn
My set of substrings is a dictionary, where the keys are the substrings and values are the probabilities with which they occur
reg = {'anuBavAn':0.35, 'a$piwra':0.2 ...... 'piwra':0.7, 'pa':0.03, 'a':0.0005}
#The length of dicitioanry is 2000
Particularly I need to find those substrings which occur more than twice
I have written the following code that performs the task. Is there a more elegant pythonic way or panda specific way to achieve the same as the current implementation is taking quite some time to execute.
elites = dict()
for reg_pat in reg_:
count = 0
eliter = len(training[training['concat'].str.contains(reg_pat)]['concat'])
if eliter >=3:
elites[reg_pat] = reg_[reg_pat]

You can use apply instead str.contains, it is faster:
reg_ = {'anuBavAn':0.35, 'a$piwra':0.2, 'piwra':0.7, 'pa':0.03, 'a':0.0005}
elites = dict()
for reg_pat in reg_:
if training['concat'].apply(lambda x: reg_pat in x).sum() >= 3:
elites[reg_pat] = reg_[reg_pat]
print (elites)
{'a': 0.0005}

Hopefully I have interpreted your question correctly. I'm inclined to stay away from regex here (in fact, I've never used it in conjunction with pandas), but it's not wrong, strictly speaking. In any case, I find it hard to believe that any regex operations are faster than a simple in check, but I could be wrong on that.
for substr in reg:
totalStringAppearances = training.apply((lambda string: substr in string))
totalStringAppearances = totalStringAppearances.sum()
if totalStringAppearances > 2:
reg[substr] = totalStringAppearances / len(training)
else:
# do what you want to with the very rare substrings
Some gotchas:
If you wanted something like a substring 'a' in 'abcdefa' to return 2, then this will not work. It merely checks for existence of the substring in each string.
Inside the apply(), I am using a potentially unreliable exploitation of booleans. See this question for more details.
Post-edit: Jezrael's answer is more complete as it uses the same variable names. But, in a simple case, regarding regex vs. apply and in, I validate his claim, and my presumption:

Apply function with pandas dataframe - POS tagger computation time

I'm very confused on the apply function for pandas. I have a big dataframe where one column is a column of strings. I'm then using a function to count part-of-speech occurrences. I'm just not sure the way of setting up my apply statement or my function.
def noun_count(row):
x = tagger(df['string'][row].split())
# array flattening and filtering out all but nouns, then summing them
return num
So basically I have a function similar to the above where I use a POS tagger on a column that outputs a single number (number of nouns). I may possibly rewrite it to output multiple numbers for different parts of speech, but I can't wrap my head around apply.
I'm pretty sure I don't really have either part arranged correctly. For instance, I can run noun_count[row] and get the correct value for any index but I can't figure out how to make it work with apply how I have it set up. Basically I don't know how to pass the row value to the function within the apply statement.
df['num_nouns'] = df.apply(noun_count(??),1)
Sorry this question is all over the place. So what can I do to get a simple result like
string num_nouns
0 'cat' 1
1 'two cats' 1
EDIT:
So I've managed to get something working by using list comprehension (someone posted an answer, but they've deleted it).
df['string'].apply(lambda row: noun_count(row),1)
which required an adjustment to my function:
def tagger_nouns(x):
list_of_lists = st.tag(x.split())
flat = [y for z in list_of_lists for y in z]
Parts_of_speech = [row[1] for row in flattened]
c = Counter(Parts_of_speech)
nouns = c['NN']+c['NNS']+c['NNP']+c['NNPS']
return nouns
I'm using the Stanford tagger, but I have a big problem with computation time, and I'm using the left 3 words model. I'm noticing that it's calling the .jar file again and again (java keeps opening and closing in the task manager) and maybe that's unavoidable, but it's really taking far too long to run. Any way I can speed it up?

I don't know what 'tagger' is but here's a simple example with a word count that ought to work more or less the same way:
f = lambda x: len(x.split())
df['num_words'] = df['string'].apply(f)
string num_words
0 'cat' 1
1 'two cats' 2

Using CONTAINS with variables sql

Ok so I am trying to reference one variable with another in SQL.
X= a,b,c,d (x is a string variable with a list of things in it)
Y= b ( Y is a string variable that may or may not have a vaue that appears in X)
I tried this:
Case when Y in (X) then 1 else 0 end as aa
But it doesnt work since it looks for exact matches between X and Y
also tried this:
where contains(X,#Y)
but i cant create Y globally since it is a variable that changes in each row of the table.( x also changes)
A solution in SAS would also be useful.
Thanks

Maybe like will help
select
*
from
t
where
X like ('%'+Y+'%')
or
select
case when (X like ('%'+Y+'%')) then 1 else 0 end
from
t
SQLFiddle example

In SAS I would use the INDEX function, either in a data step or proc sql. This returns the position within the string in which it finds the character(s), or zero if there is no match. Therefore a test if the value returned is greater than zero will result in a binary 1:0 output. You need to use the compress function with the variable containing the search characters as SAS pads the value with blanks.
Data step solution :
aa=index(x,compress(y))>0;
Proc Sql solution :
index(x,compress(y))>0 as aa

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to create an iteration for a df - pandas

Related

Pandas run function only on subset of whole Dataframe

How do I reverse each value in a column bit wise for a hex number?

Finding the count of a set of substrings in pandas dataframe

Apply function with pandas dataframe - POS tagger computation time

Using CONTAINS with variables sql

Categories

Resources