Dynamically creating a pivot table using fuzzy matching

Dynamically creating a pivot table using fuzzy matching - vba

So, I'm constantly being given data in new and different formats. I'm on a crusade to get my work to standardize data for easy use, and if I managed to convince the powers that be to standardize data, this problem becomes entirely moot. Until then, I have the following problem:
I get data in a variety of ways. Sometimes my gross sales are called total sales. Sometimes gross sales before discounts, total sales before discounts, Gross_Sales, etc. Discounts, deductions, exempt amounts, etc. form another column. So on and so forth. I'd like to be able to do the following:
1) Figure out what columns I want,
2) Turn those columns into a pivot table.
For part 1, I have two options, and I'm wondering if there's anymore: The 1st is to use Microsoft's fuzzy-matching add-in to help me match. I'd have a separate tab dedicated to fuzzy matching each column I need. The second is to just generate a long list of all the variants, and to test each one until I find a hit, assign it, and move onto testing the next one.
The second part is turning all of this into a pivot table - the resouces I have so far are https://www.thespreadsheetguru.com/blog/2014/9/27/vba-guide-excel-pivot-tables and How to Create a Pivot Table in VBA
Is there a better method? Is there another way?
Edit: Slightly better method - Grab the data columns, place them into a table, and pivot everything off of that table - it removes the need to re-create pivot tables, just need to move the data over.

Having the same problem, I use a mix of your two methods.
My data consists of a bunch of logs for rejected x-ray images, and the reject reason is a free text field. My solution was to create a table where the first column contains my desired output categories, and then each subsequent column contains a different variation of it.
For example, a row might have (column one/ouput first entry):
Positioning, POS, Positioning Error, Patient Positioning
Note that these are all fairly different from each other. Where the fuzzy matching comes in - it is used to capture all the smaller differences and mispellings around those other columns. When the fuzzy matching section decides a given reason matches a column's entry, it is then replaced with the appropriate desired output reason from column 1 of the table. In my example, a reason of 'Possitioning Err' [sic] would match to column 3 (Positioning Error) and then get converted to Positioning.
Then wash rinse repeat over the rest of your data as needed. This approach was super useful and fairly flexible in helping standardize my data. It was also computationally more expensive, but you'd only need to run the matching portion once I guess.
As for the actual mechanics of going about doing this - I use 2010, so no inbuilt functionality. I run the fuzzy matching code on a temporary worksheet until best percentage matches are found, and then overwrite the actual source data afterwards.

Related

Number of unique IDs and sum of values per category in a large dataset

I have a large dataset (approx. 250.000 records) where I have an ID column, different categories an ID can belong to and a value column:
Now I want to calculate the unique occurences of each ID per each category and same for the sum of the value. The result for the example should look like this:
In the example I was able to do this manually. However, I have a large dataset and I cannot do this manually. I thought about it in different ways, but I did not find a good solution for this. One way would be to do it for each single cell with the PowerQuery Editor and then enter the desired number for each cell (this is the way I used to create the solution for the example). But then I have to do this manually with PowerQuery for each cell. Also doing all the work with usual Excel formulas for each single cell is not a good solution, as it includes a lot of manual work. And I would like to avoid doing it manually and thought there must be a better way. If there is an Excel solution I am happy with it. If it is necessary to use VBA I am also ok with it.

How to apply a single section across multiple columns in Business Intelligence

I do a lot of reporting out of our Electronic Health Record using a Business Objects product, and one thing I run into frequently is records for which most of the columns are the same, but a few may have multiple different values.
For instance, a report I'm working on has 8 columns, mostly static information about the patient/encounter, some lab values, and a column for the consulting physician. All the columns will have only a single value per patient/encounter, except for consulting physician which may have multiple. I'd like to somehow set the table to show only a single row for the data that is unchanged, so they don't end up seeing the FIN, MRN, and lab values over and over.
However, as far as I've been able to tell with my fiddling around, I can only apply a section or break to a single column. Creating multiple sections or breaks nests them. Does anybody know of a way to treat multiple columns as sort of a composite section?
edit: I did try pulling the consulting physician column out into its own table and then setting the room number as a section, but it still caused repeated rows of the other data for any that had multiple consultings.
Additional edit: As requested here's a mockup of approximately what I'd like to see. This is mostly how it looks already when I tell BO to use the room number (the number in blue, top left of each row) as a section, however in the case of the third room, it would repeat the information in the first 5 columns for each consulting listed.

Couple of ways to do it, but putting breaks on each column is what I would do.
So, starting from "FIN" and working to "Attending", add a break on each column. It will add a summary row for each, so it will look like:
Then select the summary rows, right-click, and Delete:

Best way to handle multi-valued fields as a view/grid

In several notes applications, instead of handling related data as separate documents, if the size of the data is small (less than the 32k limit), I'll make several multi valued fields and display it in what I call a "List Panel". It's a table where each column displays one multi-value field. Since fielda(1) goes with fieldb(1) that goes with fieldc(1) there is a concept of rows. (I did a similar thing in my auditing routine discussed here )
It is always assumed that each field has exactly the same number of elements.
All the multi-value fields are then stored on the single document. This avoids several coding conventions that made my eyes bleed like having date changed, who changed it, new value fields for each field we wanted to audit. Another thing that this kept to a minimum was having to provide multiple fields for the same thing that locked you into a limit. Taxrate1, Taxrate2, Taxrate3, etc...
In my "Listpanel" the first column is a vertical checkbox. (One for each element in my lists) This is so I can select one item to bring up and edit, or select multiple values to delete "rows" or apply some kind of mass change to them.
What would be the best way to handle this under xPages to get this functionality? I tried making a table but am having the devil of a time to get the checkboxes to line up with their corresponding data items.
Views and dojo-grids seem to assume we're using a document for each row.....

This TableWalker may provide what you want http://www-10.lotus.com/ldd/ddwiki.nsf/dx/Tutorial-Introduction-to-XPages-Exercise-23
It was created when XPages was all very new, so it's SSJS rather than Java. But if you're comfortable wiith Java, converting it probably won't be a challenge.

You could use a repeat control to display the values and build a table using the table row tags in the repeat. You would want to calculate the id of the checkbox to be able to take an action on that selected row. The repeat var would be just one of your multi-value fields and you use the index of the repeat to get the value for that row from the other multi-value fields.

Rotating a table object

For the sake of clarity, I'm not looking for the transpose function, like has been asked previously on this site quite a few times.
The excel table object (Insert>Table) is exactly what I want. All I'd like to do is rotate the table for readability purposes. Currently, I have far more columns than rows, and it would improve readability a lot if the axes were flipped so that the vertical aspect of the scroll wheel could be used while looking at the data
My current data is a list of machining jobs. Each row is one job for one customer, and each column is a different parameter corresponding to settings/cost/material. With the tables, I'm able to dynamically sort the jobs in the table based on each of the parameters in a very helpful manner. This is the most important thing: to continue being able to dynamically sort based on the parameters. So far, I have been unable to find a way to transfer dynamic sorting to row headings instead of column headings.

How to display filtered data rows as a tooltip in Tableau Public?

Noob here, I have a table with different entries (rows) per different (repeating) regions.
I'd like to be able to display the data rows filtered - matching that particular region thanks - so I get those particular fields related to each region as a tooltip on a map. (I know how to build the map)
Thank you

Just dragging the fields you want to Details or Tooltip is not doing the trick?
Putting a measure on a shelf (other than filter shelf) includes that field in the visualization query results -- i.e. applies the chosen aggregation function to yield an aggregate result value for each partition of the data (as specified by the unique combination of dimensions)
Putting a dimension on a shelf (other than the filter shelf) also includes that field in the query results, but since the dimensions define how data rows are partitioned, it can affect the level of detail of the query. You'll notice this often as suddenly getting many more marks in your visualization after you add a dimension to a shelf. If you are familiar with SQL, dimensions define the fields that follow the GROUP BY keyword.
EDIT
Thanks for the addition, #AlexBlakemore. I've never said dragging a dimension would not work, only that it wouldn't work as the OP wanted it to (basically the same as you're saying).
And though everything you said (above) is true, it's particularly not exact when it comes to maps. Yes, dragging a dimension will further partition the data, but it will not create additional marks on a map (unless it has also geographical properties). Rather, the tooltip will get the first occurrence of that dimension, and display data for that only. For instance, if you drag "Product" to details, and the possible values are "Bread", "Coffee" and "Milk", it will probably just show "Product: Bread", and the measures for "Bread" only. So yes, it will partition, but no, it won't create additional marks.
Back to OP problem. What I believe you want is a tool tip with all values of the dimension (in my poor example you'd like to see "Bread, Coffee, Milk"). Tableau does not have functions to aggregate strings yet, so it's hard to do so.
What I would suggest is to create a separate sheet, and just drag the dimensions and measures you want to rows. Then put it side by side with the map on a dashboard, and use the map as a filter. Then, when you click on a country/region/city, you'll see the data of that region on the other chart.
Refer to: http://kb.tableausoftware.com/articles/knowledgebase/creating-filter-actions-dashboards
or https://www.tableausoftware.com/learn/tutorials/on-demand/authoring-interactivity

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas