Excel VBA - Adding keys to duplicated couples - vba

I'm working on a table in Excel with two columns with repeated values (text), and I need to create a new column (same sheet), where each (sorted) couple is associated with an integer.
Here a simplified example:
--> Starting point
--> Expected output
Since the number of rows is really huge (not known a priori - data are imported from external files), I need to write the code in a very efficient way!
All suggestions are warmly welcomed!

Manual method
Manually sort the columns
Insert 1 into cell C1
Insert =IF(AND(A2=A1,B2=B1),C1,C1+1) into cell C2 and copy this cell down.
VBA method (via macro recorder)
Start recording a macro
Manually do all 3 steps from above (manual method)
Stop macro recording
Now you have a basic macro that you can modify to your needs and learn from.

Related

Formatting Data in excel sheet with blue prism

I'm trying to run a duplicate check In which varying data is pulled from a website and compared to a master list, the master list being stored in Excel. The information from the website is read from a table in which has line breaks. These breaks are translated over to the data collection they are initially stored in. Some of the data from the website us eventually written to the master list in Excel. So when I read the master list back into Blue Prism to run a duplicate check, the rows that have line breaks are written into a collection as multiple rows (ex. I should have on 7 rows in my collections but am getting 42). Since the rows are not EXACTLY the same between the 2 collections, when it runs the automation does not recognize the duplicates.
The easiest way to solve this would be if I could make the collection rows have no line breaks as soon as the data is read. I've attempted to use the calculation stage to do so with no luck. I'm not sure if it is actually possible to do this, but would appreciate any direction.
Record an Excel macro to do the data sorting/cleaning in Excel (possibly Text To Columns, etc..) and then include the running of the macro as part of your Blue Prism process by using an action stage and the MS Excel VBO - Run Macro. Get the process to create an Excel instance (and create a handle data item from that stage), then use Open Workbook (whatever workbook you store your Macro in) and then use the MS Excel VBO - Run Macro (use the same handle created earlier and type in the name of the "macro").
It sounds like what is happening is that the MS Excel VBO is grabbing the data from the Excel Worksheet wholesale.
This is to say that it's accessing your Worksheet table, copying the cell values BUT not the cell formatting data, and then dumping the values into a BP collection.
Since it did not bring along any of the original cell formatting data to reference when it went to populate the collection it's just breaking up the values based on crturn/line breaks. Thus, your collection is organized based on that, and not on the original Worksheet cell.
So, with that said, on to a solution!
Solution 1
Brute force the organization of the incoming Excel cell data to the collection by looping over the Excel Worksheet cell-by-cell.
Run a loop, and in that loop have BP go into the Excel Worksheet and grab the first populated cell it comes across. Run a formatting/cleanup Calculation stage over the data. Dump the cell value into a single collection field.
Repeat.
This is...inelegant, expensive at best, and not at all recommended for any medium to large dataset. But it's definitely the best way to do string manipulation and value comparisons before it hits your collection. Since it sounds like your using a Master template then you as-well know what the expected format of your data should be.
This method will enable you implement Trim(), Concat(), or Split() in a Calculation stage to better organize your incoming data before you dump it into a collection.
This is also basically what I think you're already trying to do, but cell-by-cell instead of Worksheet row-by-row or table-by-table.
Solution 2
Clean up the table data you grab from the website before you dump it into the Excel Worksheet.
This is basically Solution 1, but in reverse. Simply format/cleanup your data before it hits you Excel Worksheet.
I'm not sure this is any better than Solution 1, but, you know, it's something...
Solution 3
Format the cell data IN the MS Excel Worksheet itself.
Basically rearrange the cells and cell data in the Excel Worksheet into a more predictable format by using the Split, Trim, Merge, or other actions included in the MS Excel VBO. You can also do this using the Data - OLEDB utility object, but that requires some pretty solid understanding of SQL syntax.
This would look like this using the MS Excel VBO:
Grab the Excel Worksheet data wholesale and dump into a collection
Count the rows/fields of the collection
Is that number consistent with the desired/expected format of your data?
If not, have the bot go back into the Excel Worksheet and reformat the cells by removing any carriage returns/line breaks/whatever else
Repeat.
However, I'm always reluctant to reformat any original source, as it's then hard to figure out what wrong and where it went wrong when you've changed the original structure of your data. So it's best to always make a copy of the Worksheet before you make any manipulation.
Unfortunately I don't have access to my BP environment at the moment or I'd provide you with the act object actions you'd need to do any of this, my bad. Once I do I'll update this answer.

Workaround for either my approach or the 256 character limit in Excel VBA

My problem is as follows, I have a workbook with (to begin with) 2 worksheets, the first (called WIP) acts as a form of data entry, each new occurrence of the BoM requires an insertion of 4 columns which happens to the left of the existing columns. At the same time a new worksheet is created based on a copy of the existing second worksheet (called FitOut) which pulls various bits of a data from the first worksheet based mainly upon the version of the BoM selected and the supplier referenced.
Of course adding new columns to the WIP sheet causes the functions, arrays and formulas in the sheets to automatically update, I had used a quick workaround by using some code to hold and then paste the new occurrences data into the worksheet which is created at the start of the macro, however the formulas have become slightly complex (due to the need to look for the previous 4 occurrences and return values based on specific cell locations) that the 256 character limit has been completely shot ( I think I'm over 800 on some bits).
I've very limited as to the layout of the WIP sheet, and the sheet needs to be fairly idiot proof (hence macros, buttons etc) but it needs to run well...
ANy and all suggestions/help would be much appreciated.
I have put an example of the formula I am trying to use, if it can be condensed further pleas let me know:
=IFERROR(IF($C$1='WIP'!$U$1,(INDEX('WIP'!$A$1:$X$2500,SMALL(IF('WIP'!$U$1:$U$2500=$A$1,ROW('WIP'!$U$1:$U$2500)),ROW(6:6)),COLUMN('WIP'!$C:$C))),(IF($C$1='WIP'!$X$1,(INDEX('WIP'!$A$1:$AA$2500,SMALL(IF('WIP'!$X$1:$X$2500=$A$1,ROW('WIP'!$X$1:$X$2500)),ROW(6:6)),COLUMN('WIP'!$C:$C))),(IF($C$1='WIP'!$AA$1,(INDEX('WIP'!$A$1:$AD$2500,SMALL(IF('WIP'!$AA$1:$AA$2500=$A$1,ROW('WIP'!$AA$1:$AA$2500)),ROW(6:6)),COLUMN('WIP'!$C:$C))),(INDEX('WIP'!$A$1:$AG$2500,SMALL(IF('WIP'!$AD$1:$AD$2500=$A$1,ROW('WIP'!$AD$1:$AD$2500)),ROW(7:7)),COLUMN('WIP'!$C:$C)))))))),"")

Duplicated cells skip 10 rows

I have an Excel spreadsheet with two sheets, Sheet 1 contains some text and formulas I wish to duplicate down to write 2,000 odd lines of C# code for my project. Extremely repetitive, so I thought I could use Excel to write it for me. Sheet 2 contains an extract from my database which I wish to use to populate that values with. My section of Excel code looks like this and is spread out over 10 rows and 5 columns:
new AccountingPeriod()
{
MonthCovered="=Sheet2!B2",
StartDate=DateTime.Parse("Sheet2!C2"),
EndDate=DateTime.Parse("Sheet2!D2"),
AccountingPeriodDescription="Sheet2!E2",
Active='=Sheet2!F2',
April='=Sheet2!G2,
TaxYear="Sheet2!H2"
},
When I highlight these 10 rows and use the Excel duplicate tool I want the cell references (for example, in my spreadsheet Sheet1!D3 = Sheet2!B2, I want Sheet1!D13 = Sheet2!B3 NOT Sheet2!B13) to increment by 1, not 10 as it's doing in my spreadsheet.
We have tried an alternative solution of writing a macro to insert 10 blank rows in Sheet2 for every populated row so that the duplicated references are correct in Sheet1 but we're currently failing miserably at getting that to work correctly.
Instead of a direct reference that is going to change on a 1-to-1 with the copy, use the INDEX function with a little maths to achieve the 10-to-1 row stagger.
In Sheet1!D3 this references Sheet2!B2.
=INDEX(Sheet2!B:B,INT(ROW(1:1)/10)+2)
In Sheet1!D13 this references Sheet2!B3.
The OFFSET function can accomplish the same thing but it is a volatile function that recalculates whenever anything in the workbook changes. INDEX provides the same functionality while remaining non-volatile and will only recalculate when something that affects its returned value changes.

Combine Columns in multiple Excel workbooks and auto-de-dupe

I have three workbooks with IDs in column A. I want to create a fourth workbook, which should combine the IDs and de-dupe them automatically so that I can perform a vlookup on them to reference data on the other workbooks. The 3 workbooks with data in them will be constantly updated with new ID numbers added, so I need the master/summary workbook to automatically grab newly added ID numbers and perform vlookups against the other workbooks.
The goal of this is to give a summary view of each record (which corresponds to a person), letting the user know which workbooks that person exists in.
I have tried doing =max() to retreive the number of ID's in each workbook, and combining them, telling me the total # of ID's that exist, combined. Then I tried to perform this: =SMALL(IF(FREQUENCY(Test1:Test2$A$2:$A$1000, ROW($1:$28))<>0, ROW($1:$28), ""), ROW(A1))
+ CTRL + SHIFT + ENTER
But I'm 1. not sure if that'll work and 2. not sure how the syntax works with 3 separate workbooks.
I also tried the union method in VBA with no luck - again I think I'm messing up the syntax.
You can retrieve a unique list of the id numbers in [ID_First.xlsx]Sheet1!$A$2:$A$999 using the following array formula in the master worksheet's A2 (needs a row above it to avoid circular references).
=IFERROR(INDEX([ID_First.xlsx]Sheet1!$A$2:$A$999,MATCH(0, IF(LEN([ID_First.xlsx]Sheet1!$A$2:$A$999),COUNTIF(A$1:A1,[ID_First.xlsx]Sheet1!$A$2:$A$999),1),0)),"")
If you stack similar formulas consecutively, passing calculation on to them with IFERROR(), you can gain a unique list from three separate workbooks.
=IF(LEN(A1),IFERROR(INDEX([ID_First.xlsx]Sheet1!$A$2:$A$999,MATCH(0, IF(LEN([ID_First.xlsx]Sheet1!$A$2:$A$999),COUNTIF(A$1:A1,[ID_First.xlsx]Sheet1!$A$2:$A$999),1),0)),IFERROR(INDEX([ID_Second.xlsx]Sheet1!$A$2:$A$999,MATCH(0, IF(LEN([ID_Second.xlsx]Sheet1!$A$2:$A$999),COUNTIF(A$1:A1,[ID_Second.xlsx]Sheet1!$A$2:$A$999),1),0)),IFERROR(INDEX([ID_Third.xlsx]Sheet1!$A$2:$A$999,MATCH(0, IF(LEN([ID_Third.xlsx]Sheet1!$A$2:$A$999),COUNTIF(A$1:A1,[ID_Third.xlsx]Sheet1!$A$2:$A$999),1),0)),""))),"")
Array formulas require Ctrl+Shift+Enter to finalize. Once entered correctly, fill down as necessary to collect all unique IDs.
With a unique list of id numbers, you can use the same method of nested IFERROR functions to look through a series of three workbooks for additional data.
=IFERROR(VLOOKUP($A2, [ID_First.xlsx]Sheet1!$A$2:$Z$999, 2, FALSE),IFERROR(VLOOKUP($A2, [ID_Second.xlsx]Sheet1!$A$2:$Z$999, 2, FALSE),IFERROR(VLOOKUP($A2, [ID_Third.xlsx]Sheet1!$A$2:$Z$999, 2, FALSE),"")))
I'm offering this as you've mentioned a total of 50 member IDs. This method can quickly (and logrythmically) eat up calculation resources when applied to larger groups of numbers.
If you've got Excel 2016 or later, you could unpivot the data using PowerQuery, which is now built in to Excel under the Get & Transform section of the Data tab in the ribbon. Plenty of examples of how to do this if you search Google for 'unpivot' and 'Powerquery'.
If you have Excel 2010 or 2013, you can download the PowerQuery addin for free from https://www.microsoft.com/en-nz/download/details.aspx?id=39379 (assuming IT let you do so).
PowerQuery is a revolution in Excel data transformation, and the learning curve is a lot less steep than advanced formulas or VBA.

Excel - How do I find all relevant rows by typing unique invoice# listed Col A

I have a Worksheet with 10 columns and data range from A1:J55. Col A has the invoice # and rest of the columns have other demographic data. Goal is to type the invoice number on a cell and display all the rows matching the invoice number from col A.
Besides auto filter function, the only thing comes to my mind is VBA. Please advice what is the best way to get the data. Thanks for your help in advance.
Alright, I'm pretty proud of this one. Again avoiding VBA, this one uses the volatile formula OFFSET to keep moving its VLOOKUP search down the table until it's found all matches. Just make sure you paste enough rows of the formula that if there are many matches, there's room for all of them to appear. If you put a border around your match area then it would be clear if you ever ran out of room and needed to copy down the formula some more.
Again, in the main section, it's just a single formula (using index):
=IFERROR(INDEX($A$1:$J$200,$M3,MATCH(N$2,$A$1:$J$1,0)),"")
This gets to be so simple because the hard work of the lookup is done by an initial column which looks up the next row that matches the invoice number. It has the formula:
=IFERROR(MATCH($L$2,OFFSET($A$1:$A$200,M2,0),0)+M2," ")
Here is the working example that goes with those formulas:
Let me know if you need any further description of how it works, but it mostly uses the same rules as above so that it's robust in copying and moving around.
I've uploaded the Excel file so you can play with it, but everything you need to reproduce this feature should be in this solution.
Google Docs - Click link and hit Ctrl+S to download and open in Excel.
A popular solution to this problem is a simple VLookup. Lookup the invoice the user types in on the table A1:J55, and then return an adjascent column's data.
Here's an example of it working:
The formula in the highlighted cell is:
=VLOOKUP($L3,$A:$J,MATCH(N$2,$1:$1,0),FALSE)
What's nice about this formula is you only need to type it once and then you can copy it across and it'll automatically pick out the correct column of the table (that's the match part). The rest is very simple:
The first part says lookup value $L3 (the invoice number typed in),
The second part says look it up in range $A:$J (which is where your table is located). I've shown how you can select the entire columns $A:$J so that you can add and remove data without worrying about adjustin the range in your lookups. (Excel takes care of optimizing the formula so that unused cells aren't checked)
The third part picks the column from which the resulting data will be drawn once a matching row is found.
The FALSE part is an indication that the invoice number must match exactly (no approximate matching allowed)
The $ signs ensure that fixed ranges like the location of your source table ($A:$J) and your lookup value ($L3) don't get automatically changed as you copy the formula across for multiple columns.
The formula is pretty easy to adapt if you want to move around your table and the area where you do your lookup. Here's an example:
Bonus
If you want to add a little spiff, you can add a dropdown to the Invoice # field so that the user gets auto-completion and the option to browse existing values like so: