Summing randomly generated values in Excel - vba

I am trying to randomly select values from a short list, sum them, then iterate that process many times. So far I think I am successful in generating the values:
Work so far
0.4855981 0 0 FALSE
Price 0.337666609 0 0 FALSE
1 $29.74 0.808816075 0 0 FALSE
2 $0.85 0.906751484 0 0 FALSE
3 $38.24 0.10572928 1 4 17
4 $17 0.957321497 0 0 FALSE
5 $25.50 0.644195743 0 0 FALSE
6 $18.70 0.133302328 0 0 FALSE
7 $29.75 0.907771163 0 0 FALSE
8 $14.45 0.156311546 0 0 FALSE
9 $30.60 0.871958447 0 0 FALSE
10 $26.34 0.0790938 1 14 24.65
11 $11.05 0.696383544 0 0 FALSE
12 $124.95 0.080728462 1 3 38.24
13 $9.35 0.03717127 1 10 26.34
14 $24.65 0.970430159 0 0 FALSE
15 $41.65 0.814402286 0 0 FALSE
0.462967917 0 0 FALSE
0.646432058 0 0 FALSE
0.49384003 0 0 FALSE
0.381349746 0 0 FALSE
0.129594937 0 0 FALSE
0.576582174 0 0 FALSE
0.37483142 0 0 FALSE
Total 106.23
In any set there are 24 attempts at selecting an item from the price list. What I have done is randomly generate a number and if it less than 0.125 (1/8 chance of getting a value from the price list per attempt) then I generate a random number between 1 and 15 and then vlookup to get the price.
However I want to iterate this process many times, so say out of 100 times each consisting of 24 attempts, what is the average value I return. I cannot find a way to simply add the number to itself each time I update the random numbers, and my VBA is pretty limited - I was considering a loop that has a clickbutton to refresh the numbers. Pseudo code since I know very little VBA:
for 1=1:100
clickbutton() #to refresh
grandtotal=grandtotal+total
end
averagevalue=grandtotal/i
I know it seems really easy, but I have not had luck searching how refresh with the clickbutton, or if that is even the best way. Thanks!

Pseudo code is a good place to start! If you name the total cell in your sheet to "Total", you should be able to run the below, which is my take on your pseudo code:
noRuns = 100
grandTotal = 0
For i = 1 to noRuns
ActiveSheet.Calculate
grandTotal = grandTotal + Range("Total").Value
Next i
MsgBox "The average value is " & grandTotal / noRuns
If that doesn't work, have a play with it and let me know how you get on.

Related

Handling features with multiple values per instance in Python for Machine Learning model

I am trying to handle my data set which contain some features that has some multiple values per instances as shown on the image
https://i.stack.imgur.com/D78el.png
I am trying to separate each value by '|' symbol to apply One-Hot encoding technique but I can't find any suitable solution to my problem
My idea is to keep every multiple values in one row or by another word convert each cell to list of integers
Maybe this is what you want:
df = pd.DataFrame(['465','444','465','864|857|850|843'],columns=['genre_ids'])
df
genre_ids
0 465
1 444
2 465
3 864|857|850|843
df['genre_ids'].str.get_dummies(sep='|')
444 465 843 850 857 864
0 0 1 0 0 0 0
1 1 0 0 0 0 0
2 0 1 0 0 0 0
3 0 0 1 1 1 1

SQL: Is there a way I can find whether a value is within a specific index range of another value?

I have two columns filled with mostly 0's and a few 1's. I want to check whether IF a 1 occurs in the first column, a 1 in the second column occurs within a range of 5 rows of that index. So for example, lets say a 1 occurs in column 1 row 83, then I would like to return TRUE if one or more 1's occur in column 2 row 83-88, and FALSE if this is not the case. Examples of this are listed in the code block. I would want to count the number of TRUE and FALSE occurrences.
TRUE:
0 0
0 0
0 0
1 1
0 0
0 0
0 0
0 0
0 0
0 0
TRUE:
0 0
0 0
0 0
1 0
0 0
0 0
0 1
0 1
0 0
0 0
FALSE:
0 0
0 0
0 1
1 0
0 0
0 0
0 0
0 0
0 0
0 1
I have no idea where to begin, so I do not have any code to start with:(
Kind regards,
Kai
Assuming you have an ordering column, you can use window functions:
select (case when count(*) = 0 then 'false' else 'true' end)
from (select t.*,
max(col2) over (order by <ordering column>
rows between current row and 4 following
) as max_col2_5
from t
) t
where col1 = 1 and max_col2_5 = 1;

rolling sum of a column in pandas dataframe at variable intervals

I have a list of index numbers that represent index locations for a DF. list_index = [2,7,12]
I want to sum from a single column in the DF by rolling through each number in list_index and totaling the counts between the index points (and restart count at 0 at each index point). Here is a mini example.
The desired output is in OUTPUT column, which increments every time there is another 1 from COL 1 and RESTARTS the count at 0 on the location after the number in the list_index.
I was able to get it to work with a loop but there are millions of rows in the DF and it takes a while for the loop to run. It seems like I need a lambda function with a sum but I need to input start and end point in index.
Something like lambda x:x.rolling(start_index, end_index).sum()? Can anyone help me out on this.
You can try of cummulative sum and retrieving only 1 values related information , rolling sum with diffferent intervals is not possible
a = df['col'].eq(1).cumsum()
df['output'] = a - a.mask(df['col'].eq(1)).ffill().fillna(0).astype(int)
Out:
col output
0 0 0
1 1 1
2 1 2
3 0 0
4 1 1
5 1 2
6 1 3
7 0 0
8 0 0
9 0 0
10 0 0
11 1 1
12 1 2
13 0 0
14 0 0
15 1 1

Undersampling for multilabel imbalanced datasets in pandas

I'm working on a roll-your-own undersampling function, since imblearn does not work neatly with multi-label classification (e.g. it only accepts one dimensional y).
I want to iterate through X and y, removing a row every 2 or 3 rows that are part of the majority class. The goal is a quick and dirty way to reduce the number of rows in the majority class.
def undersample(X, y):
counter = 0
for index, row in y.itertuples():
if row['rectangle_here'] == 0:
counter += 1
if counter > 3:
counter = 0
X.drop(index, inplace=True)
y.drop(index, inplace=True)
return X, y
But it crashes my kernel on even a small amount of rows (~30,000).
y is something like this, where anytime f2 or f3 is present, f1 is present
So, let's count the number of times 0 happens in f1 and then delete a 0 row every 3rd time:
f1 f2 f3
0 0 0 0
1 0 0 0
2 0 0 0
3 1 0 1
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0

dataframe columnwise comparision to another series

It seems dataframe.le doesn't operate column wise fashion.
df = DataFrame(randn(8,12))
series=Series(rand(8))
df.le(series)
I would expect for each column in df it will compare to series (so total 12 columns comparison with series, so 12 column*8 row comparison involved). But it appears for each element in df it will compare against every elements in series so this will involves 12(columns)*8(rows) * 8(elements in series) comparison. How can I achieve column by column comparison?
Second question is once I am done with column wise comparison I want to be able to count for each row how many 'true' are there, I am currently doing astype(int32) to turn bool into int then do sum, does this sound reasonable?
Let me give an example about the first question to show what I meant (using a simpler example since show 8*12 is tough):
>>>from pandas import *
>>>from numpy.random import *
>>>df = DataFrame(randn(2,5))
>>>t = DataFrame(randn(2,1))
>>>df
0 1 2 3 4
0 -0.090283 1.656517 -0.183132 0.904454 0.157861
1 1.667520 -1.242351 0.379831 0.672118 -0.290858
>>>t
0
0 1.291535
1 0.151702
>>>df.le(t)
0 1 2 3 4
0 True False False False False
1 False False False False False
What I expect df's column 1 should be:
1
False
True
Because 1.656517 < 1.291535 is False and -1.242351 < 0.151702 is True, this is column wise comparison. However the print out is False False.
I'm not sure I understand the first part of your question, but as to the second part, you can count the Trues in a boolean DataFrame using sum:
In [11]: df.le(s).sum(axis=0)
Out[11]:
0 4
1 3
2 7
3 3
4 6
5 6
6 7
7 6
8 0
9 0
10 0
11 0
dtype: int64
.
Essentially le is testing for each column:
In [21]: df[0] < s
Out[21]:
0 False
1 True
2 False
3 False
4 True
5 True
6 True
7 True
dtype: bool
Which for each index is testing:
In [22]: df[0].loc[0] < s.loc[0]
Out[22]: False