Need a simple search function to display most common value in a column. (with ambiguous choices) - vba

I have a very large array of data with many columns that display different outputs for the values presented. I would like to add a row above the data that will display the most common occurring value or word below.
Generally I would like to have each top of the column (right under the column label in row 1) have the most common value below. I will then use this value for various data analysis functions!
Is this possible, and if so, how? Preferably this will not require VBA, but simply a short code in the cell.
One caveat: The exact values may vary, so there is no set list where I can say "it will be one of these."
Any ideas appreciated!

Try a series of =COUNTIF(A:A,"VALUE TO SEARCH") functions if you want to stay away from VBA.
Otherwise, the best method would be to iterate through each column via VBA. With this method, you can even count the "varying" values and return the count and/or the value itself.

http://www.excel-easy.com/examples/most-frequently-occurring-word.html
This is a single formula you would write at the top of each column. Does not require VBA. You can replace the set range to an entire column, such as (A:A) instead of (A1:A7).
If you mean an array as in a data type, it could work differently but it depends what you're trying to do.

With data from A3 through A16, in A2 enter:
=INDEX($A$3:$A$16,MODE(MATCH($A$3:$A$16,$A$3:$A$16,0)))
This will work for text as well as numbers. Adjust this to match the column size.

Related

Excel formula not working as expected

I have a sheet that shows max values spent anywhere. So I need to find most expensive place and return it's name. Like this:
Whole sheet.
Function.
Function in text:
=IFS((A6=MAX(D2:D31)),(INDEX(C2:C31,MATCH(A6,D2:D31,0))),(A6=MAX(H2:H31)),(INDEX(G2:G31,MATCH(A6,H2:H31,0))),(A6=MAX(K2:K31)),(INDEX(K2:K31,MATCH(A6,L2:L31,0))))
Basically I need to find a word left to value, matching A6 cell.
Thanks in advance.
Ok.. Overcomplicated!
Firstly, why the three rows? it's a lot easier if you just have one long row with all the data (tell me if you actually need 3 I'll change my solution)
=LOOKUP(MAX(D2:D31);D2:D31;C2:C31)
The MAX formula will lookup the biggest value in the list, the Lookup formula will then match it to the name.
Please note: If more than one object has the maximum price, it will only return the first one. The only way I can think of to bypass that would be to build a macro.
EDIT:
Alright.. Multi Column solution is ugly and requires extra columns that you can just hide.
As you can see you'll need 2 new columns that will find the highest for each row, 2 new columns that will find the value for each of these "highest" (in this case tree and blueberries) and then your visible answer will simply be an if statement finding out which one is bigger and giving the final verdict. This can be expanded with an infinite number of columns but increases complexity.
Here are the formulas:
MAX(H2:H31)
LOOKUP(A5;H2:H31;G2:G31)
MAX(L2:L31)
LOOKUP(C5;L2:L6;K2:K6)
IF(A5>C5;B5;D5)

extract data in exel sheet using macro

you most probably going to think "what an idiot" but remember i never done any type of coding before so this is all new to me,
My problem are that i'm working on a HUGE excel sheet with loads of data that is not needed. i need to sort the data into a few columns, i only need column "A,K,AN,AQ" but in column "AS" i only need certain values (yes,no,blank) i only want the yes and blank values. like i said never done any coding before but i know that you can use an macro to do it so please help, how do i go about this?
before trying to get into macros, try to use functions with if else statements. They are quite easy to handle. Like: If (yes) then put it into X. Later, you could select all needed. Also, check the, how the dollar sign is used
use this links to see, if it is something for you.
One quick and dirty way of getting this job done would be to:
Delete the columns you don't need.
Select all cells in the range you're interested in, click the Insert menu, and choose "Table". If your columns have titles, select the box for "My Table has Headers."
-This turns your data into an array so that Excel recognizes that each row is an entry (instead of thinking that the cells are unrelated).
Now you can use the filter icon in the column headers to select and display only the rows containing the values in column X that you're interested in.
Note that there are some limitations to what the table feature is good for, so, as always, whether this is a good solution for you depends on what you want to do with the data.

OpenRefine - Fill between cells but not at the end of the list

I have a list of stock prices for several stocks. Some of the values are missing due to weekends, holidays and probably other reasons.
The gaps are not consistent. Some are two days and some are more than that.
I want to fill the gaps with the last known value but not at the end of the list.
I have tried in Excel to test a few cells below and if it's now empty, do the fill. The problem is that due to the inconsistency of the gaps, it's a tedious task to change the function for all the cases.
Is there a way to test for the end of a list?
UPDATE - added a screenshot.
See this screenshot. I want to fill where the blue dots are. The red dots are at the end of the list and I don't want to fill those cells.
I am looking for a way to detect the end of the list and stop the filling when the end is detected.
I think this is pretty difficult in OpenRefine and probably a different tool would work better. The main issue is that OpenRefine does not offer the ability to easily work across rows so 'summing a column' (or part of a column) is tricky - this is mentioned in https://github.com/OpenRefine/OpenRefine/issues/200
However, you can do this by forcing OpenRefine in Record mode with the whole project containing a single record. Once you've done this you can access all values in a column using syntax like:
row.record.cells["Column name"].value
This gives an array of all the non-blank values in the column. Since this ignores blank values, in order to have a true view of the values in the column you have to fill in blank cells with a value.
So I think you could probably achieve what you want as follows:
For each column you are going to work with do a cell transform to put a dummy value in empty cells - e.g. if(isBlank(value),"null",value)
Create a new column at the start of your project and put a single value in the very first cell in that column
Switch to Record mode
At this point you should have a single 'Record' in your project - e.g.
You can now access all cells in a column using syntax like row.record.cells["Column 1"].value. You can combine this with 'forRange' to iterate through the contents of this array, using the row.index as the marker for the current row.
I used the following formula to add a new column to the project:
with(row.record.cells["Column 1"].value,w,if(forRange(row.index,w.length(),1,i,w[i].toNumber()).sum()>0,"a","b"))
Then...
Change back to 'Row' mode
Remove the 'null' placeholder from the original column
Create a facet on the 'fill filter' column
In my case I filter to 'a'
Use the 'fill down' option
Remove the filter
And remove the 'record' column
Rather a long winded way of doing it to say the least, but so far I've not been able to find anything better while not going outside OpenRefine. I'm guessing you could probably compress steps 5-11 into a single step or smaller number of steps.
If you want to access the array of cell values using Jython as suggested by iMitwe you need to use:
row["record"]["cells"]["Column 1"]["value"]
instead of
row.record.cells["Column 1"].value
(step 5)
I am doing this on the top of my head, but I think your best chance my be using the fill down option in record mode:
first move your column to the first column and switch to record mode.
then use the following GREL: row.record.cells["data"].value[-1] where data is the name of your column
The [-1] will take the last value and fill the blank. For the case with the red dot, since there is no value it should remains empty. Let us know how it goes.
Unless there's something I am missing or not seeing...
I would have just sorted reverse (date ascending) on the Date column, then individually use Fill Down on each column, except for that last column where you could then use a Date facet on your column Date to specify the exact Date range you wanted to work with, then fill down on that last column, then remove the Date range facet.

Count unique string variants

There could be quite a simple solution to this, but I am trying to find the number of times a unique variant (i.e. non-duplicates) of a string appears in a column. However this string is only part of the text contained in a cell, and not the entire cell. To illustrate:
EuropeSpainMadrid
EuropeSpainBarcelona
AsiaChinaShanghai
AsiaJapanTokyo
EuropeEnglandLondon
EuropeSpainMadrid
I would like to find how many unique instances there are of a string that contains "EuropeSpain". So using this example, I would find that a variant of "EuropeSpain" appears only twice (given that the second instance of "EuropeSpainMadrid" is a duplicate).
A solution to this is to use pivots to summarise the data and remove duplicated; however given that my underlying dataset changes often this would require manual adjustments and corrections. I would therefore like to avoid adding any intermediate steps (i.e. PivotTables, other data sets etc) between my data and the counts.
UPDATE: I now understand to use wildcards to solve the first part of my question (counting the occurrences of "EuropeSpain"), however I am not yet clear on the second part of my question (how to find the number of unique occurrences).
Is there a formula or VBA code that could do this?
Using wildcards:
=COUNTIF(A1:A6,"="&"*"&C1&"*")
For without VBA but with some versatility, I suggest with Text in ColumnA (labelled), ColumnB labelled Flag and EuropeSpain in C1:
=FIND(C$1,A2)
in B2 copied down.
Then pivot A:B with Flag for FILTERS (and 1 selected), Text for ROWS and Count of Text for Sigma VALUES.
Apply Distinct Values if required (and available!), alternatively a formula of the kind:
=MATCH("Grand Total",E:E)-4
would count uniques.

Use columns.add(...) in Word with non-uniform column widths?

Problem I'm having is that table.Columns.add(ref Object BeforeColumn) requires a reference to another column in the table. However, when I try to access the last column in the table to pass as a reference using table.Columns.Add(table.Columns[table.Columns.Count])
I get the error:
"Cannot access individual columns in this collection because the table has mixed cell widths."
As my current work around, I catch the error, and call table.Columns.DistributeWidth() to make sure the columns are uniform and run the rest of the code. However, I lose the formatting of my cell widths this way, which is unfortunate.
Is there any way I can workaround this without losing the cell width?
(I realize one way is to store every cell's width before running this process, and then re-applying the widths afterward, but this seems like a very costly solution to something that should be simpler)
I've found one way to do it. Here's how I approached it.
*Caution, I'm assuming that the table is uniform. i.e. The number of columns is the same across all the rows. (Note, the API has a Table.uniform function, but the description is not complete. In the API it says "True if all the rows in a table have the same number of columns." However, it also checks if the columns have uniform width).
Instead of using table.Columns.Add(table.Columns[table.Columns.Count]) to add a column before the last below, I select a cell in the table and used the insert command:
//assuming table is the name of the table you want to add columns to
table.Cell(1, table.Columns.Count).Select();
word.Selection selection = table.Application.ActiveWindow.Selection;
selection.InsertColumns();
This might actually be a better way to add columns, as the api gives you way more options on how to insert (i.e. use InsertColumnsRight to insert to the right of the column). The Columns.Add(...) function by default inserts to the left of the select