add numbers down a column in OpenRefine - openrefine

I'd like to automatically number a column. Similar to Excel, where I can type "1" in one cell and the cells below it automatically get numbered 2, 3, 4, 5, etc. I don't know why I'm having so much trouble figuring out this function on Openrefine but any help would be greatly appreciated.
Thanks,
Gail

You can add a new column ("Add new column based on this column") with this Grel formula inside :
row.index + 1

The answer by Ettore Rizza already provides a solution for the common case. As the question author stated in a comment it does not work for his use case. He wants to add consecutive numbers to unfiltered rows.
For this you can use records. The basic idea is to create records from the filtered data and use the record index as counter.
Steps:
With filters active add a new column with the expression value.
Move the new column to the beginning to use it as records.
With filters still active add a new column (or transform the first one) with the expression row.record.index + 1.
Original
Filtered
Records
Index
A
A
A
1
1
2
B
B
B
2
C
C
C
3

Related

How to lookup if my lookup data has duplicate values?

I am trying to lookup values from Table 1 to Table 2 based on Col1 in Table 1.
The catch is that Table 1 has duplicate values (for example, A is repeated 3 times) but I don't want to duplicate the returned value from Table 2.
How can this be done through either excel or sql (e.g. LEFT JOIN)?
What SQL are you using? Are you familiar with CTE and partition?
Have a look here: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/597b876e-eb00-4013-a613-97c377408668/rownumber-and-cte?forum=transactsql
and here: (answer and 2nd comment): Select the first instance of a record
You can use those ideas to create another field that tells you whether the row is the first, 2nd , 3rd etc occurrence of Col1. Eg you'd have something like
1 B Red 150
2 B Red 150
and you can then update col3 to be zero where this new field is not 1.
EDIT: since you asked about Excel: in Excel, sort by whatever criteria you may need (col 1 first, of course). Let's say that Col1 starts (excluding the heading) in cell C2. Set cell B2 =1. Then write this formula in cell B3:
=IF(C3=C2,B2+1,1)
and drag it all the way down. This will count the occurrences of col 1, ie it will tell you which is the first, 2nd etc time a given value appears in col1. You can then use it as as the basis to change the value in other columns.
Also, it is not good practice to have a column where the first cell has a different formula from the others. You can use the same formula nesting another IF and referencing the row, so as to set one formula for the first row and one for the others.

Extracting "hidden" data from expanding/collapsing pivot table - Excel

I'm not sure if this is possible but as you can see I have a pivot table with multiple dependent and expandable fields. I am trying to concatenate the data from columns A:D into one cell which works fine in row 2 but doesn't work with blank parent cells, as you can see in column F.
Any ideas for how to achieve this?
Pivot table
This answer assumes that you don't want to just Repeat All Item Labels in the PivotTable from the "Report Layout" drop-down on the Pivt Table Tools "Design" tab.
A formula to get the first non-blank value on or above the same row as the current cell from Column B can be constructed with a combination of AGGREGATE, SUMPRODUCT and OFFSET, like so:
=OFFSET($B2,SUMPRODUCT(AGGREGATE(14,6,ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0),1))-ROW(),0)
How does it work?
Starting with the outermost part, OFFSET($B2, VALUE, 0) - this will start in cell B2, then look up or down by VALUE rows to get the value.
Next we need to know how many rows we will need to look up-or-down. Now, if we can work out the bottom-most row with data, we can subtract the current ROW() from that, giving us OFFSET($B2, NON_BLANK-ROW(),0)
So, to finish up we need to work out which rows are not blank, AND which rows are on-or-above our current row, then take the largest of those. This is going to take an ArrayFormula, but we can use SUMPRODUCT to make that calculate properly. To find the largest number we could use MAX or LARGE - but we get less errors if we pick AGGREGATE(14,6,..,1). (The 14 means "we want the kth largest number", the 6 means "ignore error values", and the 1 is k - so "we want the largest number, ignoring errors")
But, what list of numbers are we going to look at, I don't hear you ask. Well, we want the ROW for output from our range (I'm using $B$1:$B$100, because using the whole column B would take far to long to calculate repeatedly), a comparison against the current ROW(), and check that the LENgth is > 0. Those last two are comparisons, so let's write them out first:
ROW($B$1:$B100)<=ROW()
and
LEN($B$1:$B$100)>0
We want to use -- to convert TRUE and FALSE to 1 and 0 - this means that any "bad" values become 0, and any "good" values are larger than 0:
ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0)
This gives us the Row number when the Row is on-or-before the current row AND Column B is not blank - if either of those are False, then we get 0 instead. Stick that in the AGGREGATE to find the largest number:
AGGREGATE(14, 6, ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0), 1)
Then put it in a SUMPRODUCT to force Excel to treat it as an ArrayFormula, and that's your NON_BLANK. This then gives you that first formula right at the top of the post

Creating new rows by combining existing rows excel

I am fairly new here, so if this go against the rules please tell me.
I have an issue that seems pretty simple but I wanted to check to make sure. I have been trying to see if I could create a new row by combining every variable from one column with another, like so:
Column 1 Column 2 Combined
A 1 A1
B 2 A2
3 A3
B1
B2
B3
But instead of typing the combinations manually, I wanted the combined column make this combination without user input and to update automatically whenever column 1 or 2 has a row added or removed. I have been trying to figure out if there is some way to use the concatenate function in excel or the & sign but neither methods seems to work. I was thinking trying to code this in visual basics.
The main question: is this possible to do in excel? If so which function(s) could I use?
This assumes your data has one header row (row 1), Column 1 is column 'A' and Column 2 is Column 'B'. Place the formula below in an empty cell and copy down as far as your data permits.
=INDEX(A:A,INT((ROW(A2)+1)/(COUNTA(B:B)-1))+1)&INDEX(B:B,MOD(ROW(A2)-2,3)+1+1)
now if you want to add a little flag to let you know you have more row than you need for your data you could add the following:
=IF(ROW(A2)-1>(COUNTA(A:A)-1)*(COUNTA(B:B)-1),"Data Exceeded",INDEX(A:A,INT((ROW(A2)+1)/(COUNTA(B:B)-1))+1)&INDEX(B:B,MOD(ROW(A2)-2,3)+1+1))
According to: https://www.extendoffice.com/documents/excel/3097-excel-list-all-possible-combinations.html
You can use this formula:
=IF(ROW()-ROW(**$D$1**)+1>COUNTA(**$A$1:$A$4**)*COUNTA(**$B$1:$B$3**),"",INDEX(**$A$1:$A$4**,INT((ROW()-ROW(**$D$1**))/COUNTA(**$B$1:$B$3**)+1))&INDEX(**$B$1:$B$3**,MOD(ROW()-ROW($D$1),COUNTA(**$B$1:$B$3**))+1))
In the above formula, $A$1:$A$4, are the first column values, and
$B$1:$B$3 are the second list values which you want to list all their
possible combinations, the $D$1 is the cell that you put the formula,
you can change the cell references to your need.
In your case, you should use:
=IF(ROW()-ROW($C$2)+1>COUNTA($A$2:$A$3)*COUNTA($B$2:$B$4),"",INDEX($A$2:$A$3,INT((ROW()-ROW($C$2))/COUNTA($B$2:$B$4)+1))&INDEX($B$2:$B$4,MOD(ROW()-ROW($C$2),COUNTA($B$2:$B$4))+1))

Adding a row between cell with the same value using VBA code

Im new in VBA and want to know how can i format my table in such a way that each name in Column one do only have 1 row in between.
Some of them do have more than 1 row in between and some of them doesn't have. I just need to format then in such a way where every name on column A has 1 blank row in between. Any help would be appreciated!
Please note that i have thousands of data so manual will not work.
also i tried doing the filtering and convert them into single block. the problem with single block is that my column c do have more than 1 information which is connected to column a, .
here is an example.enter image description here

VBA to check for blank cells in columns based on value in another column

Given
O 1 2 3 A
A 4 5 6 B
B 7 8 9 D
O 3
C 15
T 18
I'm looking for VBA code to validate that when column A contains a value that the remaining columns also contain values and when it doesn't contain a value, that columns 2 & 5 also contain values but 3 & 4 don't.
I've simplified the example, in a real sheet there will be many more columns and rows to check.
I've considered COUNTIF and INDEX/MATCH and array forumlas but from my understanding these all work on single columns at a time.
I want to do something like WHEN A1:An<>"" THEN COUNTBLANK(B:E) ELSE COUNTA (C:D)
Is the best way to use autofilter using blanks in A and then countblank and then a second autofilter for values in A.
Thanks
You can do it with a couple of nested IF formulae as follows:
=IF(A1<>"",
"A not empty, "&IF(COUNTBLANK(B1:E1)=0,
"B:E not blank",
"B:E have blanks"),
"A blank, "&IF(AND(COUNTBLANK(B1)+COUNTBLANK(E1)=0,
COUNTBLANK(C1)+COUNTBLANK(D1)=2),
"Columns 2&5 have values and Columns 3&4 don't",
"but condition not met"))
The reason for going down the VBA route is that I want a generic reusable function as opposed to a formula I copy between cells and sheets changing the columns etc along the way ending up with a lot of duplicate code.
So something that takes a column to test and a value to test it with. Third parameter would be a range of columns to validate, and the fourth parameter the validation.
I don't want any solution to have the columns hard coded and I don't want intermediate totals at the end of rows. This is fairly easily achieved in Excel itself...
The reason for trying to use countblank is that I can apply it to a range.
After a lot of searching I discovered this (the columns don't match the original example)
=SUMPRODUCT((A2:A19<>"")*(B2:D19=""))
=SUMPRODUCT((A2:A19="")*(D2:D19=""))
=SUMPRODUCT((A2:A19="")*(B2:C19<>""))
Nice huh? I just need to convert it into VBA now.
Thanks