How to add a column that substract the difference between two existing columns ? GREL in OpenRefine - openrefine

So I'm trying to find a simple way to create a new column that displays the difference between two existing columns (each with numbers)... I can't seem to find the proper GREL expression....
So I'm trying to find the amount of items sold with a column named "stock_before" and the other named "stock_after".
I click on edit column from the column "stock_before" and then add column based on this column.
For the GREL I have already entered is:
value-cells["Stock_after"]
It returns no syntax error but still all of the cells for preview say "null"... I have transformed the value of the columns to numbers.
For Python I have tried:
substract(value,"Stock_after")
Same no syntax error but still everything null.
This seems so ridiculously simple but I couldn't find an answer... You can guess I'm fairly new to all this :) Hope someone out there can help me!
thanks for your having the patience to read this and thanks for your time if you answer!
I'd like something similar to this (3 columns):
Stock_before, Stock_after, dif
1,1,0
3,1,2
4,4,0
2,1,1

In GREL, the expression cells["Stock_after"] returns a Cell object representing the corresponding cell, not the actual value of that cell. To get the value, you need to use cells["Stock_after"].value.
So your final GREL expression should be value - cells["Stock_after"].value.
You should also make sure your values are stored as numerals, not strings: they should appear in green in the table. If they do not, use a "To number" operation on both columns first.
You can find out more about GREL and Cell objects here:
https://github.com/OpenRefine/OpenRefine/wiki/Variables

Related

Excel formula not working as expected

I have a sheet that shows max values spent anywhere. So I need to find most expensive place and return it's name. Like this:
Whole sheet.
Function.
Function in text:
=IFS((A6=MAX(D2:D31)),(INDEX(C2:C31,MATCH(A6,D2:D31,0))),(A6=MAX(H2:H31)),(INDEX(G2:G31,MATCH(A6,H2:H31,0))),(A6=MAX(K2:K31)),(INDEX(K2:K31,MATCH(A6,L2:L31,0))))
Basically I need to find a word left to value, matching A6 cell.
Thanks in advance.
Ok.. Overcomplicated!
Firstly, why the three rows? it's a lot easier if you just have one long row with all the data (tell me if you actually need 3 I'll change my solution)
=LOOKUP(MAX(D2:D31);D2:D31;C2:C31)
The MAX formula will lookup the biggest value in the list, the Lookup formula will then match it to the name.
Please note: If more than one object has the maximum price, it will only return the first one. The only way I can think of to bypass that would be to build a macro.
EDIT:
Alright.. Multi Column solution is ugly and requires extra columns that you can just hide.
As you can see you'll need 2 new columns that will find the highest for each row, 2 new columns that will find the value for each of these "highest" (in this case tree and blueberries) and then your visible answer will simply be an if statement finding out which one is bigger and giving the final verdict. This can be expanded with an infinite number of columns but increases complexity.
Here are the formulas:
MAX(H2:H31)
LOOKUP(A5;H2:H31;G2:G31)
MAX(L2:L31)
LOOKUP(C5;L2:L6;K2:K6)
IF(A5>C5;B5;D5)

Query several ranges and add automatically a column to know the source of each row

I am trying to achieve the following in Google Spreadsheets.
First, I want to query several ranges (in different sheets from the same spreadsheet). I tried a formula like this =query(arrayformula({indirect(E2:E10)}),"select * where Col1 <>''") with no success
In E2:E10 I have a list of ranges. Column F contains a name that describes the source of the value in Column E.
My second problem is that I need to add a column to the output of that query that tells me the origin of each row.
If the sources are ranges of 3 columns by country I need to merge those tables and add that country to each row.
All credits to +Ben Liebrand who helped me out here: https://support.google.com/docs/profile/3464
"I just want to start of by saying that the indirect() function does not work in an arrayformula() function as expected. So you will need to take another approach. I can understand what you are trying to do so I added another TAB in your spreadsheet to demonstrate another approach. I know it was initially a specific design you were trying so I made some changes to what you had. Maybe you can take a look at what I have offered and maybe you can tweak your design.
I know what I am offering is just very rough but you will also notice that I removed the end row specifier from your ranges in the range table.
Don't assume my example to be the final result but I was just trying to show that the range you were trying to use with the indirect() function will not work.
So hopefully this will give you a new idea of how you can maybe handle this.
My formula also adds the country to each of the tables in the output. My formula looks like this
=query(ArrayFormula({
if(len(indirect(regexextract(F2,"\w+\!\w+")&":A")),G2,),indirect(F2);
if(len(indirect(regexextract(F3,"\w+\!\w+")&":A")),G3,),indirect(F3);
if(len(indirect(regexextract(F4,"\w+\!\w+")&":A")),G4,),indirect(F4);
if(len(indirect(regexextract(F5,"\w+\!\w+")&":A")),G5,),indirect(F5);
if(len(indirect(regexextract(F6,"\w+\!\w+")&":A")),G6,),indirect(F6);
if(len(indirect(regexextract(F7,"\w+\!\w+")&":A")),G7,),indirect(F7)
})," select * where Col1 <> '' ")
Hope this is of some help to you"
And I hope is useful to the community
GerĂ³nimo

OpenRefine - Fill between cells but not at the end of the list

I have a list of stock prices for several stocks. Some of the values are missing due to weekends, holidays and probably other reasons.
The gaps are not consistent. Some are two days and some are more than that.
I want to fill the gaps with the last known value but not at the end of the list.
I have tried in Excel to test a few cells below and if it's now empty, do the fill. The problem is that due to the inconsistency of the gaps, it's a tedious task to change the function for all the cases.
Is there a way to test for the end of a list?
UPDATE - added a screenshot.
See this screenshot. I want to fill where the blue dots are. The red dots are at the end of the list and I don't want to fill those cells.
I am looking for a way to detect the end of the list and stop the filling when the end is detected.
I think this is pretty difficult in OpenRefine and probably a different tool would work better. The main issue is that OpenRefine does not offer the ability to easily work across rows so 'summing a column' (or part of a column) is tricky - this is mentioned in https://github.com/OpenRefine/OpenRefine/issues/200
However, you can do this by forcing OpenRefine in Record mode with the whole project containing a single record. Once you've done this you can access all values in a column using syntax like:
row.record.cells["Column name"].value
This gives an array of all the non-blank values in the column. Since this ignores blank values, in order to have a true view of the values in the column you have to fill in blank cells with a value.
So I think you could probably achieve what you want as follows:
For each column you are going to work with do a cell transform to put a dummy value in empty cells - e.g. if(isBlank(value),"null",value)
Create a new column at the start of your project and put a single value in the very first cell in that column
Switch to Record mode
At this point you should have a single 'Record' in your project - e.g.
You can now access all cells in a column using syntax like row.record.cells["Column 1"].value. You can combine this with 'forRange' to iterate through the contents of this array, using the row.index as the marker for the current row.
I used the following formula to add a new column to the project:
with(row.record.cells["Column 1"].value,w,if(forRange(row.index,w.length(),1,i,w[i].toNumber()).sum()>0,"a","b"))
Then...
Change back to 'Row' mode
Remove the 'null' placeholder from the original column
Create a facet on the 'fill filter' column
In my case I filter to 'a'
Use the 'fill down' option
Remove the filter
And remove the 'record' column
Rather a long winded way of doing it to say the least, but so far I've not been able to find anything better while not going outside OpenRefine. I'm guessing you could probably compress steps 5-11 into a single step or smaller number of steps.
If you want to access the array of cell values using Jython as suggested by iMitwe you need to use:
row["record"]["cells"]["Column 1"]["value"]
instead of
row.record.cells["Column 1"].value
(step 5)
I am doing this on the top of my head, but I think your best chance my be using the fill down option in record mode:
first move your column to the first column and switch to record mode.
then use the following GREL: row.record.cells["data"].value[-1] where data is the name of your column
The [-1] will take the last value and fill the blank. For the case with the red dot, since there is no value it should remains empty. Let us know how it goes.
Unless there's something I am missing or not seeing...
I would have just sorted reverse (date ascending) on the Date column, then individually use Fill Down on each column, except for that last column where you could then use a Date facet on your column Date to specify the exact Date range you wanted to work with, then fill down on that last column, then remove the Date range facet.

Need a simple search function to display most common value in a column. (with ambiguous choices)

I have a very large array of data with many columns that display different outputs for the values presented. I would like to add a row above the data that will display the most common occurring value or word below.
Generally I would like to have each top of the column (right under the column label in row 1) have the most common value below. I will then use this value for various data analysis functions!
Is this possible, and if so, how? Preferably this will not require VBA, but simply a short code in the cell.
One caveat: The exact values may vary, so there is no set list where I can say "it will be one of these."
Any ideas appreciated!
Try a series of =COUNTIF(A:A,"VALUE TO SEARCH") functions if you want to stay away from VBA.
Otherwise, the best method would be to iterate through each column via VBA. With this method, you can even count the "varying" values and return the count and/or the value itself.
http://www.excel-easy.com/examples/most-frequently-occurring-word.html
This is a single formula you would write at the top of each column. Does not require VBA. You can replace the set range to an entire column, such as (A:A) instead of (A1:A7).
If you mean an array as in a data type, it could work differently but it depends what you're trying to do.
With data from A3 through A16, in A2 enter:
=INDEX($A$3:$A$16,MODE(MATCH($A$3:$A$16,$A$3:$A$16,0)))
This will work for text as well as numbers. Adjust this to match the column size.

Build a Column by applying a formula to an existing column (like in excel)

I am new to the community. Hopefully this was not answered already.
I am trying to add a column to a DataFrame that contains a formula based on the previous columns. Example, build a series of stocks returns based on stocks close.
I know how to build a column by doing exactly the same thing to all elements of another, but not to use a columns element and formula to create another.
Thanks for your help.