Could you please advise what would be the best way to create a union column for 12 separate columns (located in 12 different Excel sheets within a workbook) with or without VBA?
There are good manuals how to do it for two columns without VBA (using MATCH function) however I am not sure how to approach the case with multiple columns.
I think can be achieved with multiple consolidation ranges for a PivotTable. Would need labels for the columns and more than one column per sheet (could clone the existing ones). Should sort and remove duplicates from the list automatically (if cloned).
EDIT:
I'll assume your IDs are all numeric (otherwise, sorting would be very tricky if not impossible without VBA). You could modify the following array formula to meet your needs (select an area with enough rows to hold the full stack of IDs, enter the formula, then commit the formula with ctrl+shift+enter):
=SMALL(IFERROR(CHOOSE(COLUMN(INDIRECT("C1:C12",FALSE)),Sheet1!A1:A73,Sheet2!A1:A70,Sheet3!A1:A79,Sheet4!A1:A58,Sheet5!A1:A51,Sheet6!A1:A94,Sheet7!A1:A50,Sheet8!A1:A89,Sheet9!A1:A75,Sheet10!A1:A89,Sheet11!A1:A70,Sheet12!A1:A94),FALSE),ROW(INDIRECT("1:"&COUNT(Sheet1!A1:A73,Sheet2!A1:A70,Sheet3!A1:A79,Sheet4!A1:A58,Sheet5!A1:A51,Sheet6!A1:A94,Sheet7!A1:A50,Sheet8!A1:A89,Sheet9!A1:A75,Sheet10!A1:A89,Sheet11!A1:A70,Sheet12!A1:A94))))
I'll use a smaller version (2 columns) to explain how it works:
=SMALL(IFERROR(CHOOSE(COLUMN(A1:B1),A1:A73,C1:C70),FALSE),ROW(1:143))
First, COLUMN(A1:B1) returns a horizontal array of integers between 1 and 2. Passing this to the CHOOSE function with the two single-column ranges creates a single 73 x 2 array from both A1:A73 and C1:C70 (instead of creating a jagged array, the last three values of the second column will be filled in with #NA).
Wrap the result with IFERROR to convert the three #NA values to FALSE (otherwise, SMALL will return an error).
Next, ROW(1:143) returns a vertical array of integers between 1 and 143. Passing the 73 x 2 array and the array of integers between 1 and 143 to SMALL will return a single 143 x 1 array (vertical) of the sorted values (the three FALSE values are ignored).
Note on INDIRECT: Using INDIRECT in this way makes the formula stable even if rows/columns are deleted; however, it also makes the formula volatile, which will cause it to be recalculated every time there is a change in the workbook, which could slow things down considerably. Another option is INDEX (e.g., ROW(A1:INDEX(A:A,COUNT(...))), which can be affected by row/column deletions, but isn't volatile.
if you don't mind a bit of manual effort, this works for numeric and non numeric IDs:
Stack columns on top of each other manually using Ctrl-C + Ctrl-V
Go to Data tab --> Filter --> Advanced Filter --> tick unique records only --> choose your copy to location
This simple two step process would then give you unique union of two columns. Obviously the higher the number of columns, the more the utility of a VBA approach.
Related
Follow on from Excel Count unique value multiple columns
I am trying to filter and setup a table containing all the unique combinations of message types.
So with three message types as an example below, I want to create a table with all the possible flows from this.
So every time MessageA exists, it is either followed by a MessageA, MessageB, MessageC or is the last of the sequence.
And everytime we see MessageC it is only followed by MessageA.
On the left, is the data and on the right is the desired result.
I want this to be able to scale to multiple columns/rows
You could do it by comparing two offset ranges, A1:D5 and B1:E5
=SUMPRODUCT(($A$1:$D$5=$G2)*($B$1:$E$5=K$1))
As you can see, I have cheated slightly by setting K1 blank so it compares correctly with column E, but this could be made part of a longer formula if it was necessary to have END as the column header for K.
I have a big file with data, updated weekly, from which a VBA script copies a lot of columns of various lengths and starting points, and then pastes these columns one by one into another file.
My question is how to best store the cell references that the script needs to be able to copy the correct columns? Currently there is a bunch of arrays storing the starting row number, starting column number, sheet number etc which are all indexed the same, and a loop function which does the actual copy paste work.
This (exceptionally bad?) solution would obviously be an absolute nightmare if the source file would change slightly at some point. So how should one do it better?
Excel is using enumerations for this purpose. Enums are the most efficient way to assign names to constants.
[Private] Enum WeeklyReport
FirstRow = 3
StartColumn = 1 ' Columns:
Text
Values
Totals = 17
Remarks
End Enum
The above declaration specifies the enum WeeklyReport with 5 values, one row and 4 columns. Text has the value of 2 because if the value is omitted the previous is incremented by 1. Therefore, Remarks = 18.
You can call up the values by their full name, like, WeeklyReport.Remarks or by their short name, like Remarks. That's why Excel gives unique names to its enumerations, like xlUp, which you might be using all the time without even knowing the enumeration's name. The names given above are possible but not very good.
Enumerations are a data type of their own which is interchangeable with Long. Declared before any code, at the top of a code sheet, they are available throughout that sheet if Private, otherwise throughout the project.
I have a sheet that shows max values spent anywhere. So I need to find most expensive place and return it's name. Like this:
Whole sheet.
Function.
Function in text:
=IFS((A6=MAX(D2:D31)),(INDEX(C2:C31,MATCH(A6,D2:D31,0))),(A6=MAX(H2:H31)),(INDEX(G2:G31,MATCH(A6,H2:H31,0))),(A6=MAX(K2:K31)),(INDEX(K2:K31,MATCH(A6,L2:L31,0))))
Basically I need to find a word left to value, matching A6 cell.
Thanks in advance.
Ok.. Overcomplicated!
Firstly, why the three rows? it's a lot easier if you just have one long row with all the data (tell me if you actually need 3 I'll change my solution)
=LOOKUP(MAX(D2:D31);D2:D31;C2:C31)
The MAX formula will lookup the biggest value in the list, the Lookup formula will then match it to the name.
Please note: If more than one object has the maximum price, it will only return the first one. The only way I can think of to bypass that would be to build a macro.
EDIT:
Alright.. Multi Column solution is ugly and requires extra columns that you can just hide.
As you can see you'll need 2 new columns that will find the highest for each row, 2 new columns that will find the value for each of these "highest" (in this case tree and blueberries) and then your visible answer will simply be an if statement finding out which one is bigger and giving the final verdict. This can be expanded with an infinite number of columns but increases complexity.
Here are the formulas:
MAX(H2:H31)
LOOKUP(A5;H2:H31;G2:G31)
MAX(L2:L31)
LOOKUP(C5;L2:L6;K2:K6)
IF(A5>C5;B5;D5)
I'm stuck with excel/vba:
I've got a 10 row x 30 column blank array in Excel. I am trying to distribute 10 integers from a known group of 10 (say 1,1,1,1,1,1,3,5,7,9) into each column randomly so that each row of the column contains one of the group (and all of the group members are used once), and I need the second column to contain another random distribution of the same group and so on.
So I'd end up with 30 columns of 10 rows each, with each column containing a different random distribution of the same 10 integers. I want to be able to change the distribution in each row by recalculating the spreadsheet too.
Is there a quick way to do this? Short of arranging 30 different rand() sorted lists and using lookups I couldn't see a way. I'm not savvy enough with VBA to have a go. If someone can point me in the right direction, I'd be eternally grateful!
Perhaps I'm missing something obvious, though this does not seem to be so straightforward using worksheet formulas alone.
If your orginal list of values is in A1:A10, then, in B1:
=INDEX($A$1:$A$10,RANDBETWEEN(1,10))
and in B2, array formula**:
=INDEX($A$1:$A$10,INDEX(MODE.MULT(IF(COUNTIF($A$1:$A$10,$A$1:$A$10)-COUNTIF(B$1:B1,$A$1:$A$10),{1,1}*ROW($A$1:$A$10))),RANDBETWEEN(1,10-ROWS($1:1))))
Copy the above down to B10.
You can then copy the formulas in B1:B10 to the right as desired.
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).
You could make a loop in which you make an array with your 10 numbers. Then loop though 30 columns, with first adding another column of 10 randomly drawn numbers to your array. See this website on how to draw random numbers. Then sort the array on the second column and post the first column.
Edit:
As I read in the comments on the other answer, the purist solution would be to:
Assign each unique option of values a random value
Sort these random values either from top to bottom or bottom to top, and select the top one.
Place it in the first row
Do the same thing again for the second row, but keep track of the sum of all the unique options, as to rule out an option once it maxed its presence.
Edit2:
Once I just clicked post I thought this a bit more through and came to the conclusion that the last digit will allmost always be 1 in this case....
There could be quite a simple solution to this, but I am trying to find the number of times a unique variant (i.e. non-duplicates) of a string appears in a column. However this string is only part of the text contained in a cell, and not the entire cell. To illustrate:
EuropeSpainMadrid
EuropeSpainBarcelona
AsiaChinaShanghai
AsiaJapanTokyo
EuropeEnglandLondon
EuropeSpainMadrid
I would like to find how many unique instances there are of a string that contains "EuropeSpain". So using this example, I would find that a variant of "EuropeSpain" appears only twice (given that the second instance of "EuropeSpainMadrid" is a duplicate).
A solution to this is to use pivots to summarise the data and remove duplicated; however given that my underlying dataset changes often this would require manual adjustments and corrections. I would therefore like to avoid adding any intermediate steps (i.e. PivotTables, other data sets etc) between my data and the counts.
UPDATE: I now understand to use wildcards to solve the first part of my question (counting the occurrences of "EuropeSpain"), however I am not yet clear on the second part of my question (how to find the number of unique occurrences).
Is there a formula or VBA code that could do this?
Using wildcards:
=COUNTIF(A1:A6,"="&"*"&C1&"*")
For without VBA but with some versatility, I suggest with Text in ColumnA (labelled), ColumnB labelled Flag and EuropeSpain in C1:
=FIND(C$1,A2)
in B2 copied down.
Then pivot A:B with Flag for FILTERS (and 1 selected), Text for ROWS and Count of Text for Sigma VALUES.
Apply Distinct Values if required (and available!), alternatively a formula of the kind:
=MATCH("Grand Total",E:E)-4
would count uniques.