Loop commands to perform same functions over different columns - numpy

I have a large data set here that I need to calculate max, min, mean, std, 10% and 90% over several columns starting from 7 - 26. What commands do I need to use in order to loop this to read a range, not just an individual column.

Related

Excel VBA - Looping to count cell values larger than / smaller than specific value

I'm currently working with a large amount of contract lists. Each list is in a separate worksheet for each year's quarter and it needs to stay that way. E.g.: 2007_Quarter1, 2007_Quarter2, etc. I have 10 years of data, so 40 quarter report worksheets.
Now I need to code a vba routine that would do the following, in the specified order:
In each quarter report worksheet, run through the column containing contract values; then
Count all values between 0 and 10,000; then
Count all values between 10,001 and 25,000; and so on
Then, when there are no more contract values in the range (which will vary from sheet to sheet), go to the next worksheet and repeat the procedure.
All results should be returned in a worksheet (let's call it Worksheets("Report")) in a table where rows are for value intervals (e.g. 0 to $10,000) and columns are for quarters (e.g. 2007 Q1, 2007 Q2, etc.).
One particular aspect of my problem is that, for practical reasons, I'd prefer to set the control intervals' min and max values in a table contained in another worksheet ("Variables"), where, for example, I'd have all minimum values in range C4 down and all maximum values in range D4 down.
For each contract value, an "If" condition should look like:
If ContractVal > Worksheets("Variables").Range("C4").Value And
If ContractVal < Worksheets("Variables").Range("D4").Value Then ...
So far I'm having no success at coding this efficiently. I suspect some loop would work but I can't find a way to make it happen. A loop would do:
In 2007_Quarter1 ws:
For each cell from A4 down to the end, count values included between Range("C4").Value and Range("D4").Value
Then for each cell from A4 down to the end, count values included between Range("C5").Value and Range("D5").Value
... and so on, then repeat for 2007_Quarter2, 2007_Quarter3, and so on.
I need rescue! Thanks in advance!
If it is ok to you to prepare the resulting matrix manually (intervals in rows and quarters in columns) you could write a macro that read the right file and count the contracts in the right range for every cell in the matrix.
If you want to build the matrix automatically you should tell the macro the year range. Then you can write a loop that read the ranges and for every range you write a loop to read every file in the right order.

Grabbing Top 5 Events From Multiple Columns in Excel

My file has four sets of columns I want to grab the top 5 out of. There are four columns with names and four columns with total dollar amount in them. I assume the issue I am having it the MATCH() formula grabs the row number, however each row could have up to four dollars and names in it. Thus creating an #N/A error when I try it. The formula I am trying is:
=INDEX(Franchise,MATCH(I37,Totals,0))
The Franchise being the four columns of names and the Totals being the four columns of totals.
At this point I am stumped.
How would I go about creating that formula?
You can see where I want the formula and results to post at the top.
Here is the file.
You need to get the top 5 Totals and use those to retrieve the Qx (or other) label. The LARGE function does not like discontiguous cell ranges but the newer AGGREGATE¹ function has a LARGE subfunction (14) and you can force it to ignore errors with option 6. Forcing anything that isn't in column D, H, L or P into a #DIV/0! error will discard them from any calculation.
In M2 use this standard formula:
=AGGREGATE(14, 6, $D$10:$P$35/NOT(MOD(COLUMN($D:$P), 4)), ROW(1:1))
In K2 use this standard formula to retrieve the Qx label:
=IFERROR(INDEX($C$10:$C$35, MATCH(M2, $D$10:$D$35, 0)),
IFERROR(INDEX($G$10:$G$35, MATCH(M2, $H$10:$H$35, 0)),
IFERROR(INDEX($K$10:$K$35, MATCH(M2, $L$10:$L$35, 0)),
INDEX($O$10:$O$35, MATCH(M2, $P$10:$P$35, 0)))))
Fill K2:M2 down to K6:M6. Your results should resemble the following.
        
Caveat - If there are ties in the top 5 amounts, a more complicated formula would have to be devised to account for multiple events with identical totals.
¹ The AGGREGATE function was introduced with Excel 2010. It is not available in earlier versions.

AVG of Cells Next to Cells used in Another Formula

I am new to asking questions here so I hope I get this correct. I am helping my dad with a spreadsheet and I'm having issues with figuring out how to do one formula. Dont know if it can be done with a formula or if it has to be done with macros.
This is a scoring sheet with multiple matches. For each match there is a total score and the cell next to the score is an X count (number of bulleyes). In the same row (column K) I calculate the top 6 total scores and average them:
=AVERAGE(LARGE((N15,Q15,T15,W15,Z15,AC15,AF15,AI15,AL15,AO15,AR15,AU15,AX15,BA15,BD15,BG15,BJ15),{1,2,3,4,5,6}))
Now I need to take the AVG of the X counts that are next to the total scores that are used in the formula above and put solution in column L.
For example, if the cells that are used for AVG score in that row are:
N15,Q15,T15,W15,Z15,AC15
then the cells that would need to be used for the X count AVG would be:
O15,R15,U15,X15,AA15,AD15
This result would be put into L15
Please help. If any clarification is needed just let me know.
Screen Shot:
Please try the following formula:
=SUMPRODUCT(O15:BM15,
--(MOD(COLUMN(N15:BL15)-COLUMN($N15),3)=0),
--(N15:BL15+O15:BM15/10^3+COLUMN(N15:BL15)/10^6>=
LARGE(N15:BL15+O15:BM15/10^3+COLUMN(N15:BL15)/10^6,6))
)/6
How does it work?
SUMPRODUCT has 3 parameters - first is the array to sum, next 2 parameters return an array of 0 and 1 to choose only interesting elements of the first array.
MOD(COLUMN(N15:BL15)-COLUMN($N15),3)=0)
This part is included to avoid listing every single cell. If the score is in every third column of the input range, we can calculate column number relative to first column, and function MOD(column,3) returns: {1,0,0,1,0,0...}. So only every third column of input array will be included in sum.
(N15:BL15+O15:BM15/10^3+COLUMN(N15:BL15)/10^6>=
LARGE(N15:BL15+O15:BM15/10^3+COLUMN(N15:BL15)/10^6,6)
This part is to decide which 6 of the scores should be included in the final sum. The trickiest part is to decide what to do with ties. My approach is to take:
if two scores are the same, take the one with higher number of bulleyes
if it is still tied, take the one from first columns
This means that instead of N15 value we calculate:
N15+O15/10^3+COLUMN(N15)/10^6
With your sample data it evaluates to: 566.017014. First three decimal places is the number of bulleyes, next 3 is column number.
You can use the same formula to calculate average of top 6 scores by changing the first parameter:
=SUMPRODUCT(N15:BL15,
--(MOD(COLUMN(N15:BL15)-COLUMN($N15),3)=0),
--(N15:BL15+O15:BM15/10^3+COLUMN(N15:BL15)/10^6>=
LARGE(N15:BL15+O15:BM15/10^3+COLUMN(N15:BL15)/10^6,6))
)/6
You can try this not so elegant solution:
=SUMPRODUCT(INDEX(N15:BK15,MATCH(LARGE((N15,Q15,T15,W15,Z15,AC15,AF15,AI15,AL15,AO15,AR15,AU15,AX15,BA15,BD15,BG15,BJ15),{1,2,3,4,5,6}),N15:BK15,0)+1))/6
Entered as array formula by Ctr+Shift+Enter in Cell L15:M15 (2 cells) which should look like this:
{=SUMPRODUCT(INDEX(N15:BK15,MATCH(LARGE((N15,Q15,T15,W15,Z15,AC15,AF15,AI15,AL15,AO15,AR15,AU15,AX15,BA15,BD15,BG15,BJ15),{1,2,3,4,5,6}),N15:BK15,0)+1))/6}
with added braces.
The number 6 is the equates to the number of top scores you want returned.
Now, why 2 cells (L15:M15). I cannot make SUMPRODUCT evaluate the resulting array from the INDEX so we have to enter it at 2 cells. I don't think that would be a problem since in your screen shot, Column M is not used.
Note: If the range evaluated have less than 6 items, it will error out. Also good point by user3964075. It may or may not be able to deal with ties.

Trailing Average Using AverageIf in Excel

I am trying to find the average for the last 3 instances only. I am using the AVERAGEIF statement and it will calculate the average for the entire range but I need it to only calculate for that last 3 instances it finds (or less if there is less than 3 available). I need the entire column for G and H to have the average for the last 3 games that the Team played.
This is what I have:
=AVERAGEIF(B3:C17,B17,D3:E17)
You can do this with array formulas (They have to be entered using the keys Ctrl+Shift+Enter)...
Basic steps are:
Find the row (including and above current) that is the third highest row number containing the team name (or use row 1 otherwise)
Use the INDIRECT ranges in your AVERAGEIF from B-that_row to C-current_row and D_that_row to E-current_row
So in cell F17 you would have the formula
{=AVERAGEIF(INDIRECT("B"&LARGE(IF(--($B$3:B17=B17)+($C$3:C17=B17),ROW($B$3:B17),1),3)&":"&CELL("address",C17)),B17,INDIRECT("D"&LARGE(IF(--($B$3:B17=B17)+($C$3:C17=B17),ROW($B$3:B17),1),3)&":"&CELL("address",E17)))}
We repeat some of the logic, because we have two ranges (criteria range and average range).
IF(--($B$3:B17=B17)+($C$3:C17=B17),ROW($B$3:B17),1) means that if column B or (using +) column C has the value of in B17, give me the row number, otherwise 1 (our <3 case... we could make this 3, the first row of team names)
LARGE(...,3) will give us the third highest of this array --> the third highest row number having our team name
INDIRECT("B"&...&":"&CELL("address",C17)) is going to give us the range using our third highest row number to the current row, columns B and C
then we do exactly the same thing as you were doing in AVERAGEIF but using this INDIRECT range and the equivalent for columns D and E
Fun question! Good luck. And remember to use Ctrl+Shift+Enter to enter it!
EDIT The above was giving an #NUM! error for the first two rows - that was because the LARGE function was trying to get the third largest in an array of 2! Also noticed that there were some cases where the column letter needed to be absolute (i.e. $) for copying to the Away column. So the updated formula:
{=AVERAGEIF(INDIRECT("B"&LARGE(IF(--($B$3:$B17=B17)+($C$3:$C17=B17),ROW($B$3:$B17),1),MIN(3,ROW()-2))&":"&CELL("address",$C17)),B17,INDIRECT("D"&LARGE(IF(--($B$3:$B17=B17)+($C$3:$C17=B17),ROW($B$3:$B17),1),MIN(3,ROW()-2))&":"&CELL("address",$E17)))}
Replaced the 3 with MIN(3,ROW()-2) so that we get 3 if there are, but 1 or 2 if we are in one of the first two data rows
OK I posted this prematurely and attempted to delete it when I realised it wouldn't work. It should work now.... providing you add another condition which is the game dates in column A. Remember that this is an array formula so hit ctrl+shift+enter. Dates in column A; teams in column B; stats in column D. This formula can reside somewhere permanent on the sheet so you can enter the team name (shown as F13 here) to get the three most recent stats.
=AVERAGE(VLOOKUP(LARGE(IF(B3:B24=F13,A3:A24),1),A3:D24,4),VLOOKUP(LARGE(IF(B3:B24=F13,A3:A24),2),A3:D24,4),VLOOKUP(LARGE(IF(B3:B24=F13,A3:A24),3),A3:D24,4))

Divide rows and then transfer divided values in new worksheet (values: time periods & amounts)

Hope you can help with this:
In worksheet "TOTALS" and Range "M11:N251" there are "from:to" time periods inserted by user ("From" is in "M11:M" and "To" is in "N11:N"). Number of rows (and inserted time periods) may vary. Time periods are quadrimester (4months) but not always 120 days (could be 119, 122 or sth like that - if less than 100 then user must not be allowed to divide). Range "W11:W251" hosts amounts of money. So, for example, "M11:N11" can be 1/1/13-1/5/13 (dd/mm/yy) and W11 just a number (i.e. 98,45).
I want to be able to divide each time period almost by 2 (first part can be 60 days, second part rest of days) and amounts accordingly (depending on the number of days of its divided period) and transfer the divided periods and amounts to a new worksheet (TOTALS2) to -let's say- range "A11:B251" & "G11:G251").
So, in the above example, in "TOTALS2" we'll have 1/1/13-1/3/13 in "A11:B11" and 1/3/13-1/5/13 in "A12:B12", and the divided amounts accordingly to the counted days of "A11:B11" & "A12:B12" in "G11" and "G12".
And so on until there are no more time periods & amounts in TOTALS to divide and transfer to TOTALS2.
How can this be done? Ideas?
Thanks in advance!
If you want to check out the file, then download it:
http://www.sendspace.com/file/5kudcy
Based on the file you give in your link where the values in TOTALS start at row 10 (rather than 11 as stated), the prices are in column S (rather than W as stated) and the values in TOTALS2 are to appear in row 10 onwards (rather than row 11 onwards as stated) the following approach works:
Add 5 columns to worksheet TOTALS. These 5 columns have a header in row 9 and formulae in rows 10, 11, 12, etc. Using columns X, Y, Z and AA the header values in X9:AB9 are
Period, Int.To, Int.From, Proportion1, Proportion2
(where Int. is an abbrevation for intermediate.)
The formulae to use in cells X10 to AB10 are:
X10: =N10-M10+1 Calculates number of days from date M10 to date N10, inclusive.
Y10: =M10+59 Calculates end date of the 60 day period starting on date M10 (i.e calculates last date of first "half period")
Z10: =1+Y10 Calculates day after date in Y10 (start date of second "half period")
AA10: =60/X10 Proportion of whole period that is accounted for by first half-period
AB10: =1-AA10 Proportion of whole period that is accounted for by second half period
The formulae in X10:AB10 can be copied down for as many rows as contain data in worksheet TOTALS, to get something like...
The additional columns in TOTALS now provide the information that you need to split each quadrimester into two "halves". Cols M, N, Y and Z provide the date information and S, AA and AB the values for splitting each quadrimester's costs. To get it to display as desired in TOTALS2 you will also need to add a couple of columns to TOTALS2. I've used columns L and M with the following headers in L9 and M9:
Source, Part
In both L10 and L11 insert the value 1, whilst in M10 and M11 insert the values 1 and 2, respectively. Now add the following formulae to L12 and M12
L12: =1+L10
M12: =M10
Copy the formulae in L12 and M12 down so that columns L and M contain values in at least twice the number of rows (in TOTALS) that you wish to split into first half and second half periods. You should end up with the sequence 1,1,2,2,3,3,4,4 etc in column L and 1,2,1,2,1,2,1,2 etc in M.
The Source column (L) indicates which data row in TOTALS (1=first, 2=second, 3=third, etc) acts as the source of the values to be split in "half" and the Part column (M) indicates which "half" it is (1=first, 2=second).
All that remains is to put it all together using appropriate formulae in columns A, B and G of TOTALS2. The formulae to insert into A10, B10 and M10 are:
A10: =OFFSET(IF(M10=1,TOTALS!M$9,TOTALS!Z$9),L10,0)
B10: =OFFSET(IF(M10=1,TOTALS!Y$9,TOTALS!N$9),L10,0)
G10: =OFFSET(TOTALS!S$9,L10,0)*OFFSET(TOTALS!Z$9,TOTALS2!L10,M10)
Copy these formulae down the rows in TOTALS2 and it should be looking like...
The formulae in col A pick the start dates of the "half" periods using either col M or col Z from the correct Source row in TOTALS, according to whether the Part value is 1 or 2. Similarly, the formulae in col B pick the end dates using cols Y and N of TOTALS. The formulae in G multiply the cost value (col S of TOTALS) by the correct proportion from either col AA or col AB in TOTALS, again according to whether Part is 1 or 2.
I haven't included refinements such as:
preventing periods which are too short (or long) in TOTALS from being split (hint: you can detect this using the Length column in worksheet TOTALS) or
controlling the number of displayed rows in TOTALS2 so that it is exactly twice the number of data rows entered in TOTALS (not too difficult and several ways it can be approached) or
rounding the calculated costs for the two "halves" to 2 decimal places AND making sure that they add back to the original quadrimester cost (it is formatting in the worksheet that is causing only 2 decimals to be displayed and it is not guaranteed that the displayed values will sum precisely to their original source cost - the examples chosen just happen to have this property. Again not too difficult to solve.)
However, the above is a basic solution on which you can build.