sum function in GREL in OpenRefine - openrefine

In OpenRefine, I'm trying to increase the value of every number in a column by 1.
The GREL expression sum([value],1) gives me Error: sum expects an array of numbers.
I guess I don't know how to produce an array of numbers. When I use a different function on the same column, such as tan([value]), I get the result I want.

I think you misunderstood the use of sum(). If you just want to add 1 to each cell, just use value + 1.
Make sure, however, that your column contains numbers (in green) and not strings (in black). If in doubt, use toNumber(value) + 1 instead.
The sum() function allows to add all the numbers contained in an array, for instance sum([1,2,3,4]) = 10, but you have no array if each cell of your column contains a unique number.

Related

Extracting "hidden" data from expanding/collapsing pivot table - Excel

I'm not sure if this is possible but as you can see I have a pivot table with multiple dependent and expandable fields. I am trying to concatenate the data from columns A:D into one cell which works fine in row 2 but doesn't work with blank parent cells, as you can see in column F.
Any ideas for how to achieve this?
Pivot table
This answer assumes that you don't want to just Repeat All Item Labels in the PivotTable from the "Report Layout" drop-down on the Pivt Table Tools "Design" tab.
A formula to get the first non-blank value on or above the same row as the current cell from Column B can be constructed with a combination of AGGREGATE, SUMPRODUCT and OFFSET, like so:
=OFFSET($B2,SUMPRODUCT(AGGREGATE(14,6,ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0),1))-ROW(),0)
How does it work?
Starting with the outermost part, OFFSET($B2, VALUE, 0) - this will start in cell B2, then look up or down by VALUE rows to get the value.
Next we need to know how many rows we will need to look up-or-down. Now, if we can work out the bottom-most row with data, we can subtract the current ROW() from that, giving us OFFSET($B2, NON_BLANK-ROW(),0)
So, to finish up we need to work out which rows are not blank, AND which rows are on-or-above our current row, then take the largest of those. This is going to take an ArrayFormula, but we can use SUMPRODUCT to make that calculate properly. To find the largest number we could use MAX or LARGE - but we get less errors if we pick AGGREGATE(14,6,..,1). (The 14 means "we want the kth largest number", the 6 means "ignore error values", and the 1 is k - so "we want the largest number, ignoring errors")
But, what list of numbers are we going to look at, I don't hear you ask. Well, we want the ROW for output from our range (I'm using $B$1:$B$100, because using the whole column B would take far to long to calculate repeatedly), a comparison against the current ROW(), and check that the LENgth is > 0. Those last two are comparisons, so let's write them out first:
ROW($B$1:$B100)<=ROW()
and
LEN($B$1:$B$100)>0
We want to use -- to convert TRUE and FALSE to 1 and 0 - this means that any "bad" values become 0, and any "good" values are larger than 0:
ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0)
This gives us the Row number when the Row is on-or-before the current row AND Column B is not blank - if either of those are False, then we get 0 instead. Stick that in the AGGREGATE to find the largest number:
AGGREGATE(14, 6, ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0), 1)
Then put it in a SUMPRODUCT to force Excel to treat it as an ArrayFormula, and that's your NON_BLANK. This then gives you that first formula right at the top of the post

Lookup most used text in range based on criteria

I have the following CSE formula to return the most used text in a range,excluding empty cells.
=INDEX(A4:D4,MODE(IF(A4:D4<>"",MATCH(A4:D4,A4:D4,0))))
My problem is that the formula returns #NA when there is only one value in the range. How can I adjust the formula to return that value?
If only concerned with a single cell being present causing problems and wanting to retrieve use this CSE:
=IF(COUNTIF(A4:D4,"*"), INDEX(A4:D4,MATCH(FALSE,ISBLANK(A4:D4),0)),INDEX(A4:D4,MODE(IF(A4:D4<>"",MATCH(A4:D4,A4:D4,0)))))
Otherise, with all distinct values being present or no mode in general,
You can count the distinct values and use that tested against the number of columns. If equal there is no mode and so use If statement to default into handling the True.
=IF(SUMPRODUCT(1/COUNTIF(A4:D4,A4:D4))=COLUMNS(A4:D4),"Do Something",INDEX(A4:D4,MODE(IF(A4:D4<>"",MATCH(A4:D4,A4:D4,0)))))
Again, a CSE so enter with Ctrl + Shift + Enter.
This bit of above formula counts the unique values:
SUMPRODUCT(1/COUNTIF(A4:D4,A4:D4))

How to pull out corresponding column and row figures from a two dimensional data table?

peeps.
I currently have a two dimensional data table, which measures the sensitivity of two inputs, with regards to profit.
The whole data table, is (D183:AI234). I take the max profit one can gain from taking a max() value, and I was wondering what formula could I create, so I can get the corresponding values in the data table (two of them) which create the profit judged by max() from the whole data table.
Things I've tried: hlookup, vlookup to get both the inputs from row and column, but get N/A from both.
For example, to get the input from the row, based on look_up value, I used this formula: =HLOOKUP(E237,D183:AI234,1,0)
Kind Regards
Data Table:
Here's a hacky way to do it using array functions. I assume there is something cleaner.
Edit I misread the original question.
I am assuming that the data itself is in the range D183:AI234--the labels for the y-category are in C183:C234 and the labels for the x-category are in D182:AI182.
To find the row of the maximum value: MAX((D183:AI234 = MaxVal)*ROW(D183:D234))
With the row number there are a variety of options for actually accessing the value in your y-labels:
You can OFFSET from the upper left of the table (assumed to be
C182). OFFSET(C182, MAX((D183:AI234 = MaxVal)*ROW(D183:D234)) - ROW(C182), 0).
You can access the labeled cell using INDIRECT and ADDRESS, the row number, and a column identifier for the labels. INDIRECT(ADDRESS(MAX((D183:AI234 = MaxVal)*ROW(D183:D234)), COLUMN(C182)))
You can determine the relative position among the labels and use INDEX and the row number to retrieve the value. INDEX(C183:C234,MAX((D183:AI234 = MaxVal)*ROW(D183:D234)), COLUMN(C182)) - ROW(C182))
Note that these are all array functions and must be entered with CTRL + SHIFT + ENTER.
I'd prefer the INDEX approach as it is non-volatile (both OFFSET and INDIRECT are volatile functions and will recalculate every time a change is made to the sheet/excel recalculates) and I generally consider it to be better practice.
To get the x-value you identify the column of the maximum with MAX((D183:AI234 = MaxVal)*COLUMN(D183:AI183)) and adapt whichever of the three methods you choose.
Original answer to find the address of the max (but not the associated category values) below:
To find the row of the maximum entry you want to multiply a boolean array where your values match the maximum, so, that is (D183:AI234 = MaxVal) by the row numbers, so ROW(D183:D234). The result of that multiplication is a vector of (0,0,..,Row of Maximum Val,...), so you take the MAX of that to find the row number.
The same is true for the column, but you would use COLUMN(D183:AI183). Then you can get a cell address using the ADDRESS function.
Putting it all together...
=ADDRESS(MAX((D183:AI234 = MaxVal)*ROW(D183:D234)),MAX((D183:AI234 = MaxVal)*COLUMN(D183:AI183)))
This must be entered as an array function (CTRL + SHIFT + ENTER)

Count of cells in a certain column the does not contain a specified value

I can't find anywhere a function that would allow me to count every cell in a column except a certain value given.
For example there is a column where the values are (-), 1, 2 and 3 , I want to count all the cells that are not "(-)".
So there should be a function like
=countif(A1:A100, Not "(-)")
Does anyone how to do that?
This can be done using <> for not equal, like so:
=COUNTIF(A1:A100, "<>-")
Replace the - with what you want to not count, and change A1:A100 to the range of data you would like the formula to consider.

Excel formula/macro for finding data with letters and numbers

I have a column of data in Excel that contains different values. I am looking for a formula or macro to distinguish the different types of data. For instance, I have a VLOOKUP for numerical values =VLOOKUP(E2,TECH!B:F,4,FALSE) but this only works for certain values.
For instance, this returns the value of E2 when it's listed as a 4 digit extension in the column. Some data points are listed as "i78990" or "n65778", etc. I want to return a value of "Chicago" when an "i" is before the number and an "Atlanta" if the "n" is before the number, etc.
Use "LEFT" function in order to get the as follows: LEFT(E2,1) to get the first letter (you can apply that to all of your cells), store the result on a different column and preform the Vlookup from there.
For a more general case of separating numbers from text you can use the following algorithm:
-Break the alphanumeric string into separate characters.
use: MID(A1,ROW($1:$9),1)
-Determine whether there is a number in the decomposed string.
use: ISNUMBER(1*MID(A1,ROW($1:$9),1))
-Determine the position of the number in the alphanumeric string.
use: MATCH(TRUE,ISNUMBER(1*MID(A1,ROW($1:$9),1)),0)
-Count the numbers in the alphanumeric string.
use =OUNT(1*MID(A1,ROW($1:$9),1))
or as a whole:
MID(A1,MATCH(TRUE,ISNUMBER(1*MID(A1,ROW($1:$9),1)),0),COUNT(1*MID(A1,ROW($1:$9),1)))
you can see an example in the following link.