Number of unique IDs and sum of values per category in a large dataset - vba

I have a large dataset (approx. 250.000 records) where I have an ID column, different categories an ID can belong to and a value column:
Now I want to calculate the unique occurences of each ID per each category and same for the sum of the value. The result for the example should look like this:
In the example I was able to do this manually. However, I have a large dataset and I cannot do this manually. I thought about it in different ways, but I did not find a good solution for this. One way would be to do it for each single cell with the PowerQuery Editor and then enter the desired number for each cell (this is the way I used to create the solution for the example). But then I have to do this manually with PowerQuery for each cell. Also doing all the work with usual Excel formulas for each single cell is not a good solution, as it includes a lot of manual work. And I would like to avoid doing it manually and thought there must be a better way. If there is an Excel solution I am happy with it. If it is necessary to use VBA I am also ok with it.

Related

Dynamically creating a pivot table using fuzzy matching

So, I'm constantly being given data in new and different formats. I'm on a crusade to get my work to standardize data for easy use, and if I managed to convince the powers that be to standardize data, this problem becomes entirely moot. Until then, I have the following problem:
I get data in a variety of ways. Sometimes my gross sales are called total sales. Sometimes gross sales before discounts, total sales before discounts, Gross_Sales, etc. Discounts, deductions, exempt amounts, etc. form another column. So on and so forth. I'd like to be able to do the following:
1) Figure out what columns I want,
2) Turn those columns into a pivot table.
For part 1, I have two options, and I'm wondering if there's anymore: The 1st is to use Microsoft's fuzzy-matching add-in to help me match. I'd have a separate tab dedicated to fuzzy matching each column I need. The second is to just generate a long list of all the variants, and to test each one until I find a hit, assign it, and move onto testing the next one.
The second part is turning all of this into a pivot table - the resouces I have so far are https://www.thespreadsheetguru.com/blog/2014/9/27/vba-guide-excel-pivot-tables and How to Create a Pivot Table in VBA
Is there a better method? Is there another way?
Edit: Slightly better method - Grab the data columns, place them into a table, and pivot everything off of that table - it removes the need to re-create pivot tables, just need to move the data over.
Having the same problem, I use a mix of your two methods.
My data consists of a bunch of logs for rejected x-ray images, and the reject reason is a free text field. My solution was to create a table where the first column contains my desired output categories, and then each subsequent column contains a different variation of it.
For example, a row might have (column one/ouput first entry):
Positioning, POS, Positioning Error, Patient Positioning
Note that these are all fairly different from each other. Where the fuzzy matching comes in - it is used to capture all the smaller differences and mispellings around those other columns. When the fuzzy matching section decides a given reason matches a column's entry, it is then replaced with the appropriate desired output reason from column 1 of the table. In my example, a reason of 'Possitioning Err' [sic] would match to column 3 (Positioning Error) and then get converted to Positioning.
Then wash rinse repeat over the rest of your data as needed. This approach was super useful and fairly flexible in helping standardize my data. It was also computationally more expensive, but you'd only need to run the matching portion once I guess.
As for the actual mechanics of going about doing this - I use 2010, so no inbuilt functionality. I run the fuzzy matching code on a temporary worksheet until best percentage matches are found, and then overwrite the actual source data afterwards.

VBA duplicate issue

I am really stucked and I do not know how to make it possible regarding coding in VBA or any coding language. I have to do something for my workplace. I have list in the Column A in my excel sheet. I have different values in it but sometimes I have the same value in different cells in the same column. I cannot remove the duplicates because those same values are a contract-id. In the 9th column I have the total sum of the value of the contract. So it is about automatization of invoices with the help of vba.
I need to add the same values together with the total sum into a different sheet.
Let's say it looks that:
omni345 54€
omni334 2€
omni345 23€
I need to put the omni345's into one single invoice pdf. I do not know how to put them together. Furthermore I need to place those 2 same values in the invoice like this which is in a different sheet:
omni345 54€
omni345 23€
So that the client clearly can see that there are x different contracts.
What I need is the right direction how to start it.

Excel Macro to combine cells of data when data matches in another column

The best way I can explain my problem is by showing a few screenshots.
I need to turn data like this:
[
Into something that displays like this:
After Data
There are multiple part numbers in the file, and I need the macro to take all the data from a matching part number and transform the data into what is displayed in the second image. All the part numbers are grouped with their data together, so it wouldn't need to run the loop through the top every single time, but adding to the entries with each new piece of data. Something also needs to be done for the years as well, because the way the data is presented, is in a range of years, and I need an entry for each year in that range.
Additional Information:
I am using this data for prep for category data for a BigCommerce site, that is working with a year/make/model plugin on the site, to create a vehicle lookup system. Thus in order for the user to look up their vehicle accurately the categories need to be listed the way they are in the second picture, which needs to be the result of the macro.
I thank anyone who takes the time to look into this, it will cut down the time I spend doing this manually by a huge amount.
You can do this with a formula (without actual VBA):
In cell F2 write: ="YMM/"&C2&"/"&D2&"/"&E2&";"
In cell F3 write: =F2&"YMM/"&C3&"/"&D3&"/"&E3&";"
drag down the formula in F3 until the last row.
The last row will contain the entire string of all vehicles.
I just noticed you may have duplicate values. You can use the built in Remove Duplicates feature to remove those before using the above technique.

Count unique string variants

There could be quite a simple solution to this, but I am trying to find the number of times a unique variant (i.e. non-duplicates) of a string appears in a column. However this string is only part of the text contained in a cell, and not the entire cell. To illustrate:
EuropeSpainMadrid
EuropeSpainBarcelona
AsiaChinaShanghai
AsiaJapanTokyo
EuropeEnglandLondon
EuropeSpainMadrid
I would like to find how many unique instances there are of a string that contains "EuropeSpain". So using this example, I would find that a variant of "EuropeSpain" appears only twice (given that the second instance of "EuropeSpainMadrid" is a duplicate).
A solution to this is to use pivots to summarise the data and remove duplicated; however given that my underlying dataset changes often this would require manual adjustments and corrections. I would therefore like to avoid adding any intermediate steps (i.e. PivotTables, other data sets etc) between my data and the counts.
UPDATE: I now understand to use wildcards to solve the first part of my question (counting the occurrences of "EuropeSpain"), however I am not yet clear on the second part of my question (how to find the number of unique occurrences).
Is there a formula or VBA code that could do this?
Using wildcards:
=COUNTIF(A1:A6,"="&"*"&C1&"*")
For without VBA but with some versatility, I suggest with Text in ColumnA (labelled), ColumnB labelled Flag and EuropeSpain in C1:
=FIND(C$1,A2)
in B2 copied down.
Then pivot A:B with Flag for FILTERS (and 1 selected), Text for ROWS and Count of Text for Sigma VALUES.
Apply Distinct Values if required (and available!), alternatively a formula of the kind:
=MATCH("Grand Total",E:E)-4
would count uniques.

Excel - How do I find all relevant rows by typing unique invoice# listed Col A

I have a Worksheet with 10 columns and data range from A1:J55. Col A has the invoice # and rest of the columns have other demographic data. Goal is to type the invoice number on a cell and display all the rows matching the invoice number from col A.
Besides auto filter function, the only thing comes to my mind is VBA. Please advice what is the best way to get the data. Thanks for your help in advance.
Alright, I'm pretty proud of this one. Again avoiding VBA, this one uses the volatile formula OFFSET to keep moving its VLOOKUP search down the table until it's found all matches. Just make sure you paste enough rows of the formula that if there are many matches, there's room for all of them to appear. If you put a border around your match area then it would be clear if you ever ran out of room and needed to copy down the formula some more.
Again, in the main section, it's just a single formula (using index):
=IFERROR(INDEX($A$1:$J$200,$M3,MATCH(N$2,$A$1:$J$1,0)),"")
This gets to be so simple because the hard work of the lookup is done by an initial column which looks up the next row that matches the invoice number. It has the formula:
=IFERROR(MATCH($L$2,OFFSET($A$1:$A$200,M2,0),0)+M2," ")
Here is the working example that goes with those formulas:
Let me know if you need any further description of how it works, but it mostly uses the same rules as above so that it's robust in copying and moving around.
I've uploaded the Excel file so you can play with it, but everything you need to reproduce this feature should be in this solution.
Google Docs - Click link and hit Ctrl+S to download and open in Excel.
A popular solution to this problem is a simple VLookup. Lookup the invoice the user types in on the table A1:J55, and then return an adjascent column's data.
Here's an example of it working:
The formula in the highlighted cell is:
=VLOOKUP($L3,$A:$J,MATCH(N$2,$1:$1,0),FALSE)
What's nice about this formula is you only need to type it once and then you can copy it across and it'll automatically pick out the correct column of the table (that's the match part). The rest is very simple:
The first part says lookup value $L3 (the invoice number typed in),
The second part says look it up in range $A:$J (which is where your table is located). I've shown how you can select the entire columns $A:$J so that you can add and remove data without worrying about adjustin the range in your lookups. (Excel takes care of optimizing the formula so that unused cells aren't checked)
The third part picks the column from which the resulting data will be drawn once a matching row is found.
The FALSE part is an indication that the invoice number must match exactly (no approximate matching allowed)
The $ signs ensure that fixed ranges like the location of your source table ($A:$J) and your lookup value ($L3) don't get automatically changed as you copy the formula across for multiple columns.
The formula is pretty easy to adapt if you want to move around your table and the area where you do your lookup. Here's an example:
Bonus
If you want to add a little spiff, you can add a dropdown to the Invoice # field so that the user gets auto-completion and the option to browse existing values like so: