Excel formula or VBA script to group data - vba

Here is a screenshot of a sample data set that I am trying to work with in Excel
I want to use either an Excel formula or a VBA script to populate the firm_anamoly column (it's manually populated right now).
The logic is that for set of rows in a given firm number, if there are more than one "sector23code"s in that set, the output in column "firm_anamoly" should be "firm_count", else "firm_anamoly" should be set to 0.
As you can see for firm_number = 5, since sector23codes are both 3 and 5, firm_anamoly is set to 3, i.e. firm_count.
I have around 500K rows of data that I am trying to work with.
Thank you.

There are 2 ways you can go about this. One way is to do it without converting your range to a table format.
Method 1:
You can enter this formula in cell D2:
{=IF(AND(IFNA(IF(A2=$A:$A,$B:$B,NA())=B2,TRUE)),0,C2)}
This will get you the results that you want I believe but it will probably overwhelm your Excel if you have a less than powerful system.
I would most recommend
Method 2:
Convert your range to an Excel table. Then enter this formula in the first row of the 'firm_anomoly' column:
{=IF(AND(IFNA(IF([#[firm_number]]=[firm_number],[sector23code],NA())=[#sector23code],TRUE)),0,[#[firm_count]])}
This version will run much more efficiently than Method 1.
Both of these are examples of Array Formulas so when you enter them hit ctrl + shift + enter to get the curly brackets to show up. Since you have so much data you should definitely back up before entering this formula; array formulas on large data sets can sometimes crash Excel.

Related

Formatting Data in excel sheet with blue prism

I'm trying to run a duplicate check In which varying data is pulled from a website and compared to a master list, the master list being stored in Excel. The information from the website is read from a table in which has line breaks. These breaks are translated over to the data collection they are initially stored in. Some of the data from the website us eventually written to the master list in Excel. So when I read the master list back into Blue Prism to run a duplicate check, the rows that have line breaks are written into a collection as multiple rows (ex. I should have on 7 rows in my collections but am getting 42). Since the rows are not EXACTLY the same between the 2 collections, when it runs the automation does not recognize the duplicates.
The easiest way to solve this would be if I could make the collection rows have no line breaks as soon as the data is read. I've attempted to use the calculation stage to do so with no luck. I'm not sure if it is actually possible to do this, but would appreciate any direction.
Record an Excel macro to do the data sorting/cleaning in Excel (possibly Text To Columns, etc..) and then include the running of the macro as part of your Blue Prism process by using an action stage and the MS Excel VBO - Run Macro. Get the process to create an Excel instance (and create a handle data item from that stage), then use Open Workbook (whatever workbook you store your Macro in) and then use the MS Excel VBO - Run Macro (use the same handle created earlier and type in the name of the "macro").
It sounds like what is happening is that the MS Excel VBO is grabbing the data from the Excel Worksheet wholesale.
This is to say that it's accessing your Worksheet table, copying the cell values BUT not the cell formatting data, and then dumping the values into a BP collection.
Since it did not bring along any of the original cell formatting data to reference when it went to populate the collection it's just breaking up the values based on crturn/line breaks. Thus, your collection is organized based on that, and not on the original Worksheet cell.
So, with that said, on to a solution!
Solution 1
Brute force the organization of the incoming Excel cell data to the collection by looping over the Excel Worksheet cell-by-cell.
Run a loop, and in that loop have BP go into the Excel Worksheet and grab the first populated cell it comes across. Run a formatting/cleanup Calculation stage over the data. Dump the cell value into a single collection field.
Repeat.
This is...inelegant, expensive at best, and not at all recommended for any medium to large dataset. But it's definitely the best way to do string manipulation and value comparisons before it hits your collection. Since it sounds like your using a Master template then you as-well know what the expected format of your data should be.
This method will enable you implement Trim(), Concat(), or Split() in a Calculation stage to better organize your incoming data before you dump it into a collection.
This is also basically what I think you're already trying to do, but cell-by-cell instead of Worksheet row-by-row or table-by-table.
Solution 2
Clean up the table data you grab from the website before you dump it into the Excel Worksheet.
This is basically Solution 1, but in reverse. Simply format/cleanup your data before it hits you Excel Worksheet.
I'm not sure this is any better than Solution 1, but, you know, it's something...
Solution 3
Format the cell data IN the MS Excel Worksheet itself.
Basically rearrange the cells and cell data in the Excel Worksheet into a more predictable format by using the Split, Trim, Merge, or other actions included in the MS Excel VBO. You can also do this using the Data - OLEDB utility object, but that requires some pretty solid understanding of SQL syntax.
This would look like this using the MS Excel VBO:
Grab the Excel Worksheet data wholesale and dump into a collection
Count the rows/fields of the collection
Is that number consistent with the desired/expected format of your data?
If not, have the bot go back into the Excel Worksheet and reformat the cells by removing any carriage returns/line breaks/whatever else
Repeat.
However, I'm always reluctant to reformat any original source, as it's then hard to figure out what wrong and where it went wrong when you've changed the original structure of your data. So it's best to always make a copy of the Worksheet before you make any manipulation.
Unfortunately I don't have access to my BP environment at the moment or I'd provide you with the act object actions you'd need to do any of this, my bad. Once I do I'll update this answer.

Text Manipulation nestled within a Query (Or ArrayFormula) (Google Sheets)

I'm trying to Query some data in my spreadsheet, returning a manufacturer based on product code. We code our products with a three digit suffix that corresponds to different customers. I know the codes but people viewing the sheet may not.
Right now, I'm trying to split the suffix from the product and perform the wuery in the same formula.
I can do this in two steps, splitting the suffix from the code and querying just the suffix, but I want to know if I can do this all in one code. My current formula returns the data I want but it does not fill the entire range of the sheet. I would rather have this happen automatically as the workbook will be dynamic.
My current formula is:
=QUERY(CxSeries,"select B where C CONTAINS '"&right(Code,3)&"' ")
https://docs.google.com/spreadsheets/d/190kom4q0XOJP4UdLTJpZf5tuJCQTflcuokRp_FJ4pBc/edit?usp=sharing
I'm not sure if query is the right way to go about this, but I'd prefer to stick to that (just because i honestly can't wrap my head around ArrayForumlas).
Thank you,
Clear all formulas you have in column C and enter in C7
=ArrayFormula(vlookup(regexextract(D7:D16,"-(\d+)$")+0, {Sheet5!C6:C,Sheet5!B6:B}, 2, 0))

How to substract two cells of x row on excel

I'm making an Excel sheet to keep track of some activities. The thing is that I have 2 cells that are date type; I want the third cell to subtract the them to get the time that the x person spent on the activity.
I know that if I type =A2-A1 it's going to give me what I want, but, since its going to be a big Excel sheet with lots of records, I don't want to input the same formula for each row just changing the row number.
Is there a way to make Excel detect the row that the user is inputing data in and then make the requested formula to get the time?
you can turn your data range into a table by highlighting the range and going to the insert tab and clicking table. Then when you type the formula into the first cell and click in the cells when selecting instead of typing it out, you will notice that it is using the column names instead, also it will automatically fill the column with the new formula. That would be my suggestion.

Display change in value of a cell in adjacent cell using excel VBA

I have a excel sheet that display's price on certain items in a column by looking up amazon API using excel vba. The price of may change overtime. So I am trying to display the difference in prices each time i run my macro, in a cell adjacent to the cell that displays price.
But I am not sure how to achieve this. Can any body guide me on how to achieve this?
This is just a sample, it must be adapted to your schema and data layout. Say the prices are stored in column A from A1 to A100. Say you already have a macro called RefreshData() that updates column A. In B1 enter:
=C1-A1
and copy down. This macro store the current values in column C before refreshing the data:
Sub DoUpdate()
Range("A1:A100").Copy Range("C1")
Call RefreshData
End Sub
Column B will display the price difference.
Something like this?
Let's say your data are in a range A2:A10
Dim rng as Range
Set rng = Range("A2:A10")
rng.Offset(0,1).Value = rng.Value
Run this before you run your original macro to store the values in an adjacent column before the values change. You may need to make the range dynamic, depending on your needs.
Without seeing you code, I cannot give a detailed answer. However, I ran across a similar problem once, not using Amazon API though, but a sharepoint connection.
If the amazon api is somewhat similar to the sharepoint stuff, I guess it refreshes cells when you click "update", or run the update sub. In that case you will have to either create an array to store the old prices in vba (very slow process), and then write them to your table, or create a separate tab where you store the item-lastPrice combination.
I ended up storing not only current price but all prices and the date/time of the price, to be able to see change over time.
For the copying of data itself using VBA, either of the above methods should work. In my initial code I used vba loops :-p, but copying using excel functionality is much faster.

Excel - How do I find all relevant rows by typing unique invoice# listed Col A

I have a Worksheet with 10 columns and data range from A1:J55. Col A has the invoice # and rest of the columns have other demographic data. Goal is to type the invoice number on a cell and display all the rows matching the invoice number from col A.
Besides auto filter function, the only thing comes to my mind is VBA. Please advice what is the best way to get the data. Thanks for your help in advance.
Alright, I'm pretty proud of this one. Again avoiding VBA, this one uses the volatile formula OFFSET to keep moving its VLOOKUP search down the table until it's found all matches. Just make sure you paste enough rows of the formula that if there are many matches, there's room for all of them to appear. If you put a border around your match area then it would be clear if you ever ran out of room and needed to copy down the formula some more.
Again, in the main section, it's just a single formula (using index):
=IFERROR(INDEX($A$1:$J$200,$M3,MATCH(N$2,$A$1:$J$1,0)),"")
This gets to be so simple because the hard work of the lookup is done by an initial column which looks up the next row that matches the invoice number. It has the formula:
=IFERROR(MATCH($L$2,OFFSET($A$1:$A$200,M2,0),0)+M2," ")
Here is the working example that goes with those formulas:
Let me know if you need any further description of how it works, but it mostly uses the same rules as above so that it's robust in copying and moving around.
I've uploaded the Excel file so you can play with it, but everything you need to reproduce this feature should be in this solution.
Google Docs - Click link and hit Ctrl+S to download and open in Excel.
A popular solution to this problem is a simple VLookup. Lookup the invoice the user types in on the table A1:J55, and then return an adjascent column's data.
Here's an example of it working:
The formula in the highlighted cell is:
=VLOOKUP($L3,$A:$J,MATCH(N$2,$1:$1,0),FALSE)
What's nice about this formula is you only need to type it once and then you can copy it across and it'll automatically pick out the correct column of the table (that's the match part). The rest is very simple:
The first part says lookup value $L3 (the invoice number typed in),
The second part says look it up in range $A:$J (which is where your table is located). I've shown how you can select the entire columns $A:$J so that you can add and remove data without worrying about adjustin the range in your lookups. (Excel takes care of optimizing the formula so that unused cells aren't checked)
The third part picks the column from which the resulting data will be drawn once a matching row is found.
The FALSE part is an indication that the invoice number must match exactly (no approximate matching allowed)
The $ signs ensure that fixed ranges like the location of your source table ($A:$J) and your lookup value ($L3) don't get automatically changed as you copy the formula across for multiple columns.
The formula is pretty easy to adapt if you want to move around your table and the area where you do your lookup. Here's an example:
Bonus
If you want to add a little spiff, you can add a dropdown to the Invoice # field so that the user gets auto-completion and the option to browse existing values like so: