Further to this Thread:
I am trying to write a formula that returns maximum frequency of non blank lines and not "W"s in column B of the array shown here
This formula returns max consecutive non-W recurrences in the array:
=ArrayFormula(MAX(FREQUENCY(if(Result<>"w", ID), if(Result<>"w", 0, ID))))
And I was hoping this formula would return max consecutive non-W (<>="W") and non-blanks (<>=""), but for some reason, it doesn't return the required input:
=ArrayFormula(MAX(FREQUENCY(if(Result<>"w", if(Result<>"", ID)), if(Result<>"w", if(Result<>"", 0, ID)))))
I understand that for this example I can search for "L"s, however, I am dealing with much more complicated arrays that have multiple values and blank lines, so solving this issue should help me solve a much bigger dataframe issue I'd like to share with my colleagues on spreadsheets.
PS: If you think the solution is best in Apps Script, I'm happy for any guidance.
Thanks a lot in advanced.
Related
I have a pandas dataframe that I've extracted from a json object using pd.json_normalize.
It has 4 rows and over 60 columns, and with the exception of the 'ts' column there are no columns where there is more than one value.
Is it possible to merge the four rows togather to give one row which can then be written to a .csv file? I have searched the documentation and found no information on this.
To give context, the data is a one time record from a weather station, I will have records at 5 minute intervals and need to put all the records into a database for further use.
I've managed to get the desired result, it's a little convoluted, and i would expect that there is a much more succint way to do it, but I basically manipulated the dataframe, replaced all nan's with zero, replaced some strings with ints and added the columns together as shown in the code below:
with open(fname,'r') as d:
ws=json.loads(next(d))
df=pd.json_normalize(ws['sensors'], record_path='data')
df3=pd.concat([df.iloc[0],df.iloc[1], df.iloc[2],
df.iloc[3]],axis=1)
df3.rename(columns={0 :'a', 1:'b', 2 :'c' ,3 :'d'}, inplace=True)
df3=df3.fillna(0)
df3.loc['ts',['b','c','d']]=0
df3.loc[['ip_v4_gateway','ip_v4_netmask','ip_v4_address'],'c']=int(0)
df3['comb']=df3['a']+df3['b']+df3['c']+df3['d']
df3.drop(columns=['a','b','c','d'], inplace=True)
df3=df3.T
As has been said by quite a few people, the documentation on this is very patchy, so I hope this may help someone else who is struggling with this problem! (and yes, i know that one line isn't indented properly, get over it!)
I have an issue of speed. (Apologies for the long post…). I am using Excel 2013 and 2016 for Windows.
I have a workbook that performs 10,000+ calculations on a 200,000 cell table (1000 rows x 200 columns).
Each calculation returns an integer (e.g. count of filtered rows) or more usually a percentage (e.g. sum of value of filtered rows divided by sum of value of rows). The structure of the calculation is variations of the SUMPRODUCT(COUNTIFS()) idea, along the lines of:
=IF($B6=0,
0,
SUMPRODUCT(COUNTIFS(
Data[CompanyName],
CompanyName,
Data[CurrentYear],
TeamYear,
INDIRECT(VLOOKUP(TeamYear&"R2",RealProgress,2,FALSE)),
"<>"&"",
Data[High Stage],
NonDom[NonDom]
))
/$B6
)
Explaining above:
the pair Data[Company Name] and CompanyName is the column in the table and the condition value for the first filter.
The pair Data[Current Year] and TeamYear are the same as above and constitute the second filter.
The third pair looks up a intermediary table and returns the name of the column, the condition ("<>"&"") is ‘not blank’, i.e. returns all rows that have a value in this column
Finally, the fourth pair is similar to 3 above but returns a set of values that matches the set of values in
Lastly, the four filters are joined together with AND statements.
It is important to note that across all the calculations the same principle is applied of using SUMPRODUCT(COUNTIFS()) – however there are many variations on this theme.
At present, using Calculate on a select range of sheets (rather than the slower calculating the whole workbook), yields a speed of calculation of around 30-40 seconds. Not bad, and tolerable as calculations aren’t performed all the time.
Unfortunately, the model is to be extended and now could approach 20,000 rows rather 1,000 rows. Calculation performance is directly linked to the number of rows or cells, therefore I expect performance to plummet!
The obvious solution [1] is to use arrays, ideally passing an array, held in memory, to the formula in the cell and then processing it along with the filters and their conditions (the lookup filters being arrays too).
The alternative solution [2] is to write a UDF using arrays, but reading around the internet the opinion is that UDFs are much slower than native Excel functions.
Two questions:
Is solution [1] possible, and the best way of doing this, and if so how would I construct it?
If solution [1] is not possible or not the best way, does anyone have any thoughts on how much quicker solution [2] might be compared with my current solution?
Are there other better solutions out there? I know about Power BI Desktop, PowerPivot and PowerQuery – however this is a commercial application for use by non-Excel users and needs to be presented in the current Excel ‘grid’ form of rows and columns.
Thanks so much for reading!
Addendum: I'm going to try running an array calculation for each sheet on the Worksheet.Activate event and see if there's some time savings.
Writing data to arrays is normally a good idea if looking to increase speed. Done like this:
Dim myTable As ListObject
Dim myArray As Variant
'Set path for Table variable
Set myTable = ActiveSheet.ListObjects("Table1")
'Create Array List from Table
myArray = myTable.DataBodyRange
(Source)
Disclaimer: Numbers below are randomly generated
What I'm trying to do is, purely in VBA, look at the ratio of [column B]/[column A] and checking whether or not the ratio in row 10 (=1,241/468) is below the minimum of the ratios or above the maximum of the ratios in rows 1 through 9 but only compared to the rows where there is a 1 in column C.
That is, compare Cell(B10)/Cell(A10) to Cell(B2)/Cell(A2), Cell(B3)/Cell(A3), etc. (only comparing against rows with a 1 in column C).
The workbook I'm working with has a lot more data and columns and I'm not allowed to explicitly edit the cells, so defining a new column is out of the question. Is there a way to do this in VBA such that it essentially returns a boolean depending whether or not the ratio in the last row violates the threshold defined above?
You can achieve the minimum and maximum ratios (with criteria) easily with the AGGREGATE¹ function's SMALL sub-function and LARGE sub-function.
The formulas in D13:E13 are,
=AGGREGATE(15, 6, ((B1:B9)/(A1:A9))/C1:C9, 1)
=AGGREGATE(14, 6, ((B1:B9)/(A1:A9))/C1:C9, 1)
The 6 is the AGGREGATE parameter for ignoring error values. By dividing the ratio
by the value in column C we are producing #DIV/0! errors for anything we do not want considered leaving them ignored. If the values in C were more diverse, we could divide by (C1:C9=1) to produce the same results.
Since we are using the SMALL and LARGE sub-functions, we can easily retrieve the second, third, etc. ratios by increasing the k parameter (the 1 off the back end).
I've modified some of the values in your sample slightly to demonstrate that the min and max with criteria are being picked up correctly.
These can be adapted to VBA with the WorksheetFunction object or Application.Evaluate method.
¹The AGGREGATE¹ function's was introduced with Excel 2010. It is not available in previous versions.
I'm trying to calculate the 99.5% percentile for a data set of 100000 values in an array (arr1) within VBA using the percentile function as follows:
Pctile = Application.WorksheetFunction.Percentile(arr1, 0.995)
Pctile = Application.WorksheetFunction.Percentile_Inc(arr1, 0.995)
Neither works and I keep getting a type mismatch (13).
The code runs fine if I limit the array size up to a maximum of 65536. As far as I was aware calculation limited by available memory since Excel 2007 array sizes when passing to macro limited by available memory since Excel 2000.
I'm using Excel 2010 on a high performance server. Can anyone confirm this problem exists? Assuming so, I figure that my options are to build a vba function to calculate the percentile 'manually' or output to a worksheet, calculate it there and read it back. Are there any alternatives and what would be quickest?
The error would occur if arr1 is 1-dimensional and has greater than 65536 elements (see Charles' answer in Array size limits passing array arguments in VBA). Dimension arr1 as a two-dimensional array with a single column:
Dim arr1(1 to 100000, 1 to 1)
This works in Excel 2013. Based on Charles' answer, it appears that it will work with Excel 2010.
Here is a Classic VBA example that mimics the Excel Percentile function.
Percentile and Confidence Level (Excel-VBA)
In light of Jean's exposure of the Straight Insertion method being inefficient. I've edited this answer with the following:
I read that QuickSelect seems to excel with large records and is quite efficient doing so.
References:
Wikipedia.org: Quick Select
A C# implementation can be found # Fast Algorithm for computing percentiles to remove outliers which should be easily converted to VB.
I am relatively new to the world of programming and I was wondering if anybody could help me with a small project. I am trying to create two programs in VB.Net that each do one of the following individual actions:
Find the average grade given several user-inputted scores on assignments. The program should also provide the following feedback according to the final score (i.e. A, B, C, D, F).
Run two separate threads printing numbers (or words) in ascending and descending orders. (The numbers (or words) should be given by the user.)
I have a basic understanding of VB.Net, but I am having trouble when it comes to creating even remotely complex programs. I have a few ideas on how I may go about these, such as using an arraylist for the first question that takes user input to find the grades, and then uses a series of If-Then-Else statements to display the letter grade, and possibly using steps simply with dual threading that would result in numerical order being printed in ascending order and descending order. Any help or advice would be greatly appreciated.
P.S.
I will be adding the code I have so far for both of these programs shortly. In the meantime, if you can help me at all with the information I have given you, it would be helpful.
Here's some pseudocode to help you get started on #1:
get list of scores from user input (Console.Readline() if this is a console app)
assign scores to an array or list (List would be a good choice)
get the total score (assuming they're not weighted, if you have a List then you can just use the Sum() method)
get the number of grades (again, if you have a List then you can use Count() method)
divide total by count to get average (if you need a decimal average, you'll have to cast your values to double first, or if your average can be an int just leave them all as int)
use an if-then-else to compare the average to each grade cutoff (if > 90 then "A", else if > "80" then "B", etc)