Openrefine add Counter to value - openrefine

I'm using a text facet to get only rows that include a certain value. With the resulting rows I'd like to fill down a column with values from another column. This is how I'm doing:
cells["Auto_Objektkennung"].value
How could I add a continuing number to every value starting with 0?
Pseudo Code:
cells["Auto_Objektkennung"].value + '-' + COUNTER+1
Unfortunately, the row index does not help as due to the text facet I'm not starting with one but somewhere around 8000
cells["Auto_Objektkennung"].value + '-' + row.record.index

Here is a manual way to achieve this with OpenRefine and GREL without delegating the task to Clojure or Jython.
Idea: We can first create records based on a text facet or text filter.
Then we can use row.record.index to create the expected "continuing number[s]".
Recipe:
With your text facet (or filter) active, add a new column named "record_marker".
Move the new column "record_marker" to the beginning.
Add a new column "counter" using the expression row.record.index - 1.
Blank down the new "counter" column.
You can now use the "counter" column in your expression.
if(cells["counter"].value >= 0, cells["Auto_Objektkennung"].value + "-" + cells["counter"].value, "")
Clean up by deleting the "record_marker" and "counter" columns.

Related

Extracting "hidden" data from expanding/collapsing pivot table - Excel

I'm not sure if this is possible but as you can see I have a pivot table with multiple dependent and expandable fields. I am trying to concatenate the data from columns A:D into one cell which works fine in row 2 but doesn't work with blank parent cells, as you can see in column F.
Any ideas for how to achieve this?
Pivot table
This answer assumes that you don't want to just Repeat All Item Labels in the PivotTable from the "Report Layout" drop-down on the Pivt Table Tools "Design" tab.
A formula to get the first non-blank value on or above the same row as the current cell from Column B can be constructed with a combination of AGGREGATE, SUMPRODUCT and OFFSET, like so:
=OFFSET($B2,SUMPRODUCT(AGGREGATE(14,6,ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0),1))-ROW(),0)
How does it work?
Starting with the outermost part, OFFSET($B2, VALUE, 0) - this will start in cell B2, then look up or down by VALUE rows to get the value.
Next we need to know how many rows we will need to look up-or-down. Now, if we can work out the bottom-most row with data, we can subtract the current ROW() from that, giving us OFFSET($B2, NON_BLANK-ROW(),0)
So, to finish up we need to work out which rows are not blank, AND which rows are on-or-above our current row, then take the largest of those. This is going to take an ArrayFormula, but we can use SUMPRODUCT to make that calculate properly. To find the largest number we could use MAX or LARGE - but we get less errors if we pick AGGREGATE(14,6,..,1). (The 14 means "we want the kth largest number", the 6 means "ignore error values", and the 1 is k - so "we want the largest number, ignoring errors")
But, what list of numbers are we going to look at, I don't hear you ask. Well, we want the ROW for output from our range (I'm using $B$1:$B$100, because using the whole column B would take far to long to calculate repeatedly), a comparison against the current ROW(), and check that the LENgth is > 0. Those last two are comparisons, so let's write them out first:
ROW($B$1:$B100)<=ROW()
and
LEN($B$1:$B$100)>0
We want to use -- to convert TRUE and FALSE to 1 and 0 - this means that any "bad" values become 0, and any "good" values are larger than 0:
ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0)
This gives us the Row number when the Row is on-or-before the current row AND Column B is not blank - if either of those are False, then we get 0 instead. Stick that in the AGGREGATE to find the largest number:
AGGREGATE(14, 6, ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0), 1)
Then put it in a SUMPRODUCT to force Excel to treat it as an ArrayFormula, and that's your NON_BLANK. This then gives you that first formula right at the top of the post

Lookup most used text in range based on criteria

I have the following CSE formula to return the most used text in a range,excluding empty cells.
=INDEX(A4:D4,MODE(IF(A4:D4<>"",MATCH(A4:D4,A4:D4,0))))
My problem is that the formula returns #NA when there is only one value in the range. How can I adjust the formula to return that value?
If only concerned with a single cell being present causing problems and wanting to retrieve use this CSE:
=IF(COUNTIF(A4:D4,"*"), INDEX(A4:D4,MATCH(FALSE,ISBLANK(A4:D4),0)),INDEX(A4:D4,MODE(IF(A4:D4<>"",MATCH(A4:D4,A4:D4,0)))))
Otherise, with all distinct values being present or no mode in general,
You can count the distinct values and use that tested against the number of columns. If equal there is no mode and so use If statement to default into handling the True.
=IF(SUMPRODUCT(1/COUNTIF(A4:D4,A4:D4))=COLUMNS(A4:D4),"Do Something",INDEX(A4:D4,MODE(IF(A4:D4<>"",MATCH(A4:D4,A4:D4,0)))))
Again, a CSE so enter with Ctrl + Shift + Enter.
This bit of above formula counts the unique values:
SUMPRODUCT(1/COUNTIF(A4:D4,A4:D4))

sum function in GREL in OpenRefine

In OpenRefine, I'm trying to increase the value of every number in a column by 1.
The GREL expression sum([value],1) gives me Error: sum expects an array of numbers.
I guess I don't know how to produce an array of numbers. When I use a different function on the same column, such as tan([value]), I get the result I want.
I think you misunderstood the use of sum(). If you just want to add 1 to each cell, just use value + 1.
Make sure, however, that your column contains numbers (in green) and not strings (in black). If in doubt, use toNumber(value) + 1 instead.
The sum() function allows to add all the numbers contained in an array, for instance sum([1,2,3,4]) = 10, but you have no array if each cell of your column contains a unique number.

add numbers down a column in OpenRefine

I'd like to automatically number a column. Similar to Excel, where I can type "1" in one cell and the cells below it automatically get numbered 2, 3, 4, 5, etc. I don't know why I'm having so much trouble figuring out this function on Openrefine but any help would be greatly appreciated.
Thanks,
Gail
You can add a new column ("Add new column based on this column") with this Grel formula inside :
row.index + 1
The answer by Ettore Rizza already provides a solution for the common case. As the question author stated in a comment it does not work for his use case. He wants to add consecutive numbers to unfiltered rows.
For this you can use records. The basic idea is to create records from the filtered data and use the record index as counter.
Steps:
With filters active add a new column with the expression value.
Move the new column to the beginning to use it as records.
With filters still active add a new column (or transform the first one) with the expression row.record.index + 1.
Original
Filtered
Records
Index
A
A
A
1
1
2
B
B
B
2
C
C
C
3

Adding 1 to existing number in a cell based on value of another cell

Alright so I looked through for other solutions but I didn't get anything close enough with my limited knowledge to make it work so I hoping some geniuses here can help.
Basically I am using excel to autoupdate some data based on the value of another cell. A simplified version of my table looks like the below:
ID Step Count
526985 - Step 1 8
123569 + Step 3 3
589745 - Not in AMP 1
589465 + Step 2 5
IDs are unique and always 6 digits (just fyi if that helps anything). There will never be a Step column or count column value without an ID value
I would like to use the change val in vba so it changes as I go along automatically
The goal is for the user to not have to update manually the value in the "Count" column
When the user starts working on the sheet, the "Step" column will be blank and will be selected from a drop down menu but the "Count" and "ID" will be populated already
What I need:
When a value of "+ Step 1", "+ Step 2", "+ Step 3", "+ Step 3 ext", "- Step 2", "- Step 1" is selected in the "Step" column for an ID, I need "+1" added to whatever the current value is in the "count" column
When a value of "- Not in AMP" is selected from the "Step" column, I need the value to be 0 in the "Count" column
There will be other values selectable from the "Step" column which I need to be ignored (Keep the same "Count" column value)
After a step value has been selected in the "Step" column and the "count" column has been updated. I still need to be able to go back and change that value to any other number manually.
I think that's about it. I thought of using formulas which I could do but the issue is where I need to be able to overwrite the value with another, it will delete the formula. I'm open to anything that makes this work though. Thank you in advance!
After you have a Change event you could have some logic to check:
- if user is adding a new value in the correct column, you would load the previous data into a variant to perform the logic that you have given to populate the addition cells
- if not, let the user update the values.