Complex Formulas within Excel Using VBA - vba

I am working on vba code where I have data (for Slope Inclinometers) at various depths like so:
Depth A0 A180 Checksum B0 B180 Checksum
4.5 (-1256) 1258 2 (-394) 378 (-16)
4.5 (-1250) 1257 7 (-396) 376 (-20)
4.5 (-1257) 1257 0 (-400) 374 (-26)
Depth A0 A180 Checksum B0 B180 Checksum
5 (-1214) 1214 0 (-472) 459 (-13)
5 (-1215) 1212 -3 (-472) 455 (-17)
5 (-1216) 1211 -5 (-473) 455 (-18)
UNKNOWN AMOUNT OF DATA WILL BE PRESENT (depends how much the user transfers to this sheet)
Now I need to be able to calculate the A Axis Displacement, the B Axis Displacement, and the resultant which have formulas as followed:
A Axis Displacement = [((A0-A180)/2)-((A0*-A180*)/2))]*(constant/constant)
Where * is the initial readings which is always the first row of data at that specified depth.
B Axis Displacement = [((A0-A180)/2)-((A0*-A180*)/2))]*(constant/constant)
Where * is the initial readings which is always the first row of data at that specified depth.
Resultant = SQRT[(A Axis Displacement)^2 + (B Axis Displacement)^2]
I'm struggling to find examples of how I can implement this using vba as there will be various depths present (unknown amount) on the same sheet where the formula will need to start over at each new depth present.
Any helps/tips would be greatly appreciated!

how I can implement this using vba as there will be various depths present...
You still can do it purely with formulas and easy auto-fill, because the formula can find the the first occurrence of the current depth and perform all the necessary calculations, leaving blank at header rows or blank rows. For instance, you can enter these formulas at row 2 and fill down all the rows.
H2 (A Axis Displacement):
=IF(ISNUMBER($A2),0.5*(B2-C2-VLOOKUP($A2,$A:$F,2,0)+VLOOKUP($A2,$A:$F,3,0)), "")
I2 (B Axis Displacement):
=IF(ISNUMBER($A2),0.5*(E2-F2-VLOOKUP($A2,$A:$F,5,0)+VLOOKUP($A2,$A:$F,6,0)), "")
J2 (Resultant):
=IF(ISNUMBER($A2),SQRT(SUMSQ(H2,I2)),"")
p.s. in the displacements formulas I omitted the (constant/constant) factor as it is irrelevant to the answer, you can easily multiply the 0.5 factor by anything you need.

Related

using 'loop' or 'for' with table data to pull each row data and use the pulled data for two parameters in gams

I am new to GAMS and I have a table data which has 3 rows and 6 columns. I want to pull each row and use its data for two parameters(pull each row which has 6 element and use the first three elements for one parameter and the other three elements for the second parameter) using loop or for statement. i tried to use both of them but for the loop i received zero value for my parameter which is incorrect and for the for statement i received some errors.
this is my code for the first row which both 'loop' and 'for' are used (i used them separately each time but for show what was my code i just wrote them together).
Please help me.
Thanks
scalars j;
sets
o /red,green,blue/
p /b1,b2,b3,p1,p2,p3/
k /1*3/;
Table sup(*,*)
b1 b2 b3 p1 p2 p3
red 12 15 20 200 50 50
green 16 17 0 150 50 0
blue 13 18 0 100 50 0 ;
parameters Bid_Red(k),Pmax_Red(k),t;
*for statement***************
for(j= 1 to 3,
t=card(o)+j;
Bid_Red(k)$( ord(k) = j )=sup('red',j);
Pmax_Red(k)$( ord(k) = j )=sup('red',t);
);
*loop statement***************
t=card(o);
loop(k,
Bid_Red(k)=sup('red',k);
Pmax_Red(k)=sup('red',k+t);
);
display Bid_red, Pmax_Red
One of the core features of GAMS is how it deals with set structures and indexing. I'd recommend looking at the excellent documentation, for example on set definition https://www.gams.com/latest/docs/UG_SetDefinition.html, to really get a feel for how to get the best out of it.
In your case, you can proceed as follows. p is a set. Create some subsets of it p_ and b_, given by the syntax subset_name(set_name).
sets p_(p) / p1, p2, p3 /,
b_(p) / b1, b2, b3 /;
Create parameters over appropriate dimensions (i.e. the full set), and define them over the subset you are interested in:
parameters bid_red(o,p),pmax_red(o,p);
bid_red(o,b_) = sup(o,b_);
pmax_red(o,p_) = sup(o,p_);
Then display bid_red, pmax_red; gives:
---- 21 PARAMETER bid_red
b1 b2 b3
red 12.000 15.000 20.000
green 16.000 17.000
blue 13.000 18.000
---- 21 PARAMETER pmax_red
p1 p2 p3
red 200.000 50.000 50.000
green 150.000 50.000
blue 100.000 50.000
If you do want to select individual rows, you can use e.g. pmax_red('red',p_) in your code. This is essentially just a special case of subsetting in which the subset is of size 1.

Plotting data from two sets with different shapes in the same plot

I am using data collected from two different instruments which have different resolution because of the sampling rate of each instrument. For a specific time, one of the sets have >10k entries while the other has ~2.5k. They however capture data over the same time interval, and I want to plot them on top of each other even though they have different resolution in data. The minimum and maximum x of both sets are the same however one of them have more entries.
Simplified it could look like this:
1st set from instrument with higher sampling rate:
time(s) value
0.0 10
0.2 11
0.4 12
0.6 13
0.8 14
... ..
100 50
2nd set from instrument with lower sampling rate:
time(s) value
0 100
1 120
2 125
3 128
4 130
. ...
100 430
They are measuring different things, but I would like to display them in the same plot. How can I accomplish this?
I found the mistake.. I was trying to plot both datasets using the time data from the first instrument. Of course they need to be plotted with their respective time data and I put the first time data in the second plot by mistake..

How should I impute NaN values in a categorical column?

Should I encode a categorical column and use label encoding, then impute NaN values with most frequent value, or are there other ways?
As encoding requires converting dataframe to array, then imputing would require again array to dataframe conversion (all this for a single column, and there are more columns like that).
Fore example, I have the variable BsmtQual which evaluates the height of a basement and has following number of categories:
Ex Excellent (100+ inches)
Gd Good (90-99 inches)
TA Typical (80-89 inches)
Fa Fair (70-79 inches)
Po Poor (<70 inches
NA No Basement
Out of 2919 values in BsmtQual, 81 are NaN values.
For problems you have in the future like this that don't involve coding you should post at https://datascience.stackexchange.com/.
This depends on a few things. First of all, how important is this variable in your exercise? Assuming that you are doing classification, you could try removing all rows without with NaN values, running a few models, then removing the variable and running the same models again. If you haven't seen a dip in accuracy, then you might consider removing the variable completely.
If you do see a dip in accuracy or can't judge impact due to the problem being unsupervised, then there are several other methods you can try. If you just want a quick fix, and if there aren't too many NaNs or categories, then you can just impute with the most frequent value. This shouldn't cause too many problems if the previous conditions are satisfied.
If you want to be more exact, then you could consider using the other variables you have to predict the class of the categorical variable (obviously this will only work if the categorical variable is correlated to some of your other variables). You could use a variety of algorithms for this, including classifiers or clustering. It all depends on the distribution of your categorical variable and how much effort you want to put it in to solve your issue.
(I'm only learning as well, however I think thats most of your options)
"… or there are other ways."
Example:
Ex Excellent (100+ inches) 5 / 5 = 1.0
Gd Good (90-99 inches) 4 / 5 = 0.8
TA Typical (80-89 inches) 3 / 5 = 0.6
Fa Fair (70-79 inches) 2 / 5 = 0.4
Po Poor (<70 inches 1 / 5 = 0.2
NA No Basement 0 / 5 = 0.0
However, labels express less precision (affects accuracy if combined with actual measurements).
Could be solved by either scaling values over category range (e.g. scaling 0 - 69 inches over 0.0 - 0.2), or by approximation value for each category (more linearly accurate). For example, if highest value is 200 inch:
Ex Excellent (100+ inches) 100 / 200 = 0.5000
Gd Good (90-99 inches) ((99 - 90) / 2) + 90 / 200 = 0.4725
TA Typical (80-89 inches) ((89 - 80) / 2) + 80 / 200 = 0.4225
Fa Fair (70-79 inches) ((79 - 70) / 2) + 70 / 200 = 0.3725
Po Poor (<70 inches (69 / 2) / 200 = 0.1725
NA No Basement 0 / 200 = 0.0000
Actual measurement 120 inch 120 / 200 = 0.6000
Produces decent approximation (range mid-point value, except Ex, which is a minimum value). If calculations on such columns produce inaccuracies it is for notation imprecision (labels express ranges rather than values).

Google Spreadsheet with SQL query - finding best combination

I have a google spreadsheet for my gaming information. It contains 2 sheets - one for monster information, another for team.
Monster information sheet contains the attack value, defend value, and the mana cost of monsters. It's almost like a database of monsters that I can summon.
Team sheet does the following:
Asks for the amount of mana I currently have.
Computes a list of up to 5 monsters that I can summon (it can be less than 5).
Each monster has their own mana cost, therefore total mana cost mustn't exceed the amount of mana I have given in point 1.
The tabulated list should give me a team that have the highest combined attack value. It does not matter how many monsters are summoned. Each monster cannot be summoned twice though.
I have been thinking of using query() function so that I can make use of SQL statements. (so that I can hopefully retrieve the tabulated list directly)
Sample: Monster Info
A B C D
1 Monster Attack Defense Cost
2 MonA 1200 1200 35
3 MonB 1400 1300 50
... ...
Sample: Team
A B C D
1 Mana 120
2
3 Attack Team
4 Monster Attack Cost Total Attack
5 MonB 1400 50 1400
6 MonA 1200 35 2600
7 ... ...
I have these formula in "Team" sheet
A5: =query('Monster Info'!$A$:$D,"SELECT A,B,D ORDER BY B DESC LIMIT 5")
B5: =CONTINUE(A5, 1, 2)
C5: =CONTINUE(A5, 1, 3)
D5: =C5
A6: =CONTINUE(A5, 2, 1)
B6: =CONTINUE(A5, 2, 2)
C6: =CONTINUE(A5, 2, 3)
D6: =D5+C6
That only gets the 5 best attack monsters, regardless of the mana cost consideration. How do I do that such that it takes consideration of both attack value and mana cost value? There is another problem shown in the example below:
Example: (simplified version, without defense value etc)
Monster Attack Cost
MonA 1400 50
MonB 1200 35
MonC 1100 30
MonD 900 25
MonE 500 20
MonF 400 15
MonG 350 10
MonH 250 5
If I have 160 mana, then the obvious team is A+B+C+D+E (5100 Attack).
If I have 150 mana, it becomes A+B+C+D+G (4950 Attack).
If I have 140 mana, it becomes A+B+C+D (4600 Attack).
If I have 130 mana, it becomes B+C+D+E+F (4100 Attack using 125 mana) or A+B+C+F (4100 Attack using all 130 mana).
If I have 120 mana, it becomes B+C+D+E+G (4050 Attack).
If I have 110 mana, it becomes B+C+D+F+H (3850 Attack).
As you can see, there isn't really a pattern within the results.
Any expert willing to share their insights on this?
I've played with the problem for an hour and I only have a workaround here. Your problem seems to be a standard linear programming task which should can easily be solved by a "Solver" software. There used to be a so called "Solver" in google spreadsheet, but unfortunately it was removed from the newest version. If you are not insisting on Google solution, you should try it in one of the Solver-supported spreadsheet manager softwares.
I tried MS Office (it has a Solver add-in, installation guide: http://office.microsoft.com/en-001/excel-help/load-the-solver-add-in-HP010342660.aspx).
Before you run the solver, you should prepare your original dataset a bit, with helper columns and cells.
Add a new column next to the "Cost" column (let's assume it is column "D"), and under it put each row either 0, or 1. This column will tell you if a monster is selected to the attack team or not.
Add two more columns ("E" and "F" respectively). These columns will be products of the Attack and of the Cost respectively. So you should write a function to the E2 cell: =b2*d2, and for the F2 cell: =c2*d2. With this way if a monster is selected (which is told by the D column, remember), the appropriate E and F cells will be non zero values, aotherwise they will be 0.
Create a SUM row under the last row, and create a summarizing function for the D,E,F columns respectively. So in my spreadsheet D10 cell gets its value like this: =sum(d2:d9), and so on.
I created a spreadsheet to show these steps: https://docs.google.com/spreadsheets/d/1_7XRlupEEwat3CthSSz8h_yJ44MysK9hMsj0ijPEn18/edit?usp=sharing
Remember to copy this worksheet to an MS Office worksheet, before you start the Solver.
Now, you are ready to start the Solver. (Data menu, Solver in MS Office). You can see a video here on using the Solver: https://www.youtube.com/watch?v=Oyc0k9kiD7o
It's not that hard as it looks like, but for this case I'll describe what to write where:
Set Objective: you should select the "E10" cell, as that represents the sum of all the attack points.
Check "Max" radiobutton as we would like to maximize the value of the attacks.
By Changing variable cells: Select the "d2:d9" interval as those cells are representing whether a monster is selected or not. The solver will try to adjust these values (0, or 1) in order to maximise the sum attack.
Subject to the Contraints: Here we should add some constraints. Click on the Add button, and then:
First we should ensure that d2:d9 are all binary values. So "Cell reference" should be "d2:d9" and from the dropdown menu, select "bin" as binary.
Another constraint should be that the sum of the selected monsters should not exceed 5. So select the cell where the sum of the selected monsters is represented (D10) and add "<=" and the value "5"
Finally we cannot use more manna that we have, so select the cell in which you store the sum of used manna (F2), and "<=", and add the whole amount of manna we can spend in my case it's in the I2 cell).
Done. It should work, in my case it worked at least.
Hope it helps anyway.

How to Resize using Lanczos

I can easily calculate the values for sinc(x) curve used in Lanczos, and I have read the previous explanations about Lanczos resize, but being new to this area I do not understand how to actually apply them.
To resample with lanczos imagine you
overlay the output and input over
eachother, with points signifying
where the pixel locations are. For
each output pixel location you take a
box +- 3 output pixels from that
point. For every input pixel that lies
in that box, calculate the value of
the lanczos function at that location
with the distance from the output
location in output pixel coordinates
as the parameter. You then need to
normalize the calculated values by
scaling them so that they add up to 1.
After that multiply each input pixel
value with the corresponding scaling
value and add the results together to
get the value of the output pixel.
For example, what does "overlay the input and output" actually mean in programming terms?
In the equation given
lanczos(x) = {
0 if abs(x) > 3,
1 if x == 0,
else sin(x*pi)/x
}
what is x?
As a simple example, suppose I have an input image with 14 values (i.e. in addresses In0-In13):
20 25 30 35 40 45 50 45 40 35 30 25 20 15
and I want to scale this up by 2, i.e. to an image with 28 values (i.e. in addresses Out0-Out27).
Clearly, the value in address Out13 is going to be similar to the value in address In7, but which values do I actually multiply to calculate the correct value for Out13?
What is x in the algorithm?
If the values in your input data is at t coordinates [0 1 2 3 ...], then your output (which is scaled up by 2) has t coordinates at [0 .5 1 1.5 2 2.5 3 ...]. So to get the first output value, you center your filter at 0 and multiply by all of the input values. Then to get the second output, you center your filter at 1/2 and multiply by all of the input values. Etc ...