Conditional Sum using VBA - vba

I want a code in VBA, which will get me a sum for a category (which is displayed as a number), note that this number is changing and not in serial order, as soon as it changes, I want the sum for that particular sub-division. Example- say Towns is a category and Ward is sub category, so my data is displayed as following:
So lets say the data says that town numbered 100 has 3 wards (1001, 1002, 1003) with their population data,and town numbered 123 has 2 wards (1231, 1232) with their total population data correspondingly, now I want to sum the total population in town 100 and similarly for town 123..and so on.

It sounds like a Pivot Table will do what you want and you won't need any custom VBA code at all.

Related

match tables with intermediate mapping table (fuzzy joins with similar strings)

I'm using BigQuery.
I have two simple tables with "bad" data quality from our systems. One represents revenue and the other production rows for bus journeys.
I need to match every journey to a revenue transaction but I only have a set of fields and no key and I don't really know how to do this matching.
This is a sample of the data:
Revenue
Year, Agreement, Station_origin, Station_destination, Product
2020, 123123, London, Manchester, Qwerty
Journeys
Year, Agreement, Station_origin, Station_destination, Product
2020, 123123, Kings Cross, Piccadilly Gardens, Qwer
2020, 123123, Kings Cross, Victoria Station, Qwert
2020, 123123, London, Manchester, Qwerty
Every station has a maximum of 9 alternative names and these are stored in a "station" table.
Stations
Station Name, Station Name 2, Station Name 3,...
London, Kings Cross, Euston,...
Manchester, Piccadilly Gardens, Victoria Station,...
I would like to test matching or joining the tables first with the original fields. This will generate some matches but there are many journeys that are not matched. For the unmatched revenue rows, I would like to change the product name (shorten it to two letters and possibly get many matches from production table) and then station names by first change the station_origin and then station_destination. When using a shorter product name I could possibly get many matches but I want the row from the production table with the most common product.
Something like this:
1. Do a direct match. That is, I can use the fields as they are in the tables.
2. Do a match where the revenue.product is changed by shortening it to two letters. substr(product,0,2)
3. Change the rev.station_origin to the first alternative, Station Name 2, and then try a join. The product or other station are not changed.
4. Change the rev.station_origin to the first alternative, Station Name 2, and then try a join. The product is changed as above with a substr(product,0,2) but rev.station_destination is not changed.
5. Change the rev.station_destination to the first alternative, Station Name 2, and then try a join. The product or other station are not changed.
I was told that maybe I should create an intermediate table with all combinations of stations and products and let a rank column decide the order. The station names in the station's table are in order of importance so "station name" is more important than "station name 2" and so on.
I started to do a query with a subquery per rank and do a UNION ALL but there are so many combinations that there must be another way to do this.
Don't know if this makes any sense but I would appreciate any help or ideas to do this in a better way.
Cheers,
Cris
To implement a complex joining strategy with approximate matching, it might make more sense to define the strategy within JavaScript - and call the function from a BigQuery SQL query.
For example, the following query does the following steps:
Take the top 200 male names in the US.
Find if one of the top 200 female names matches.
If not, look for the most similar female name within the options.
Note that the logic to choose the closest option is encapsulated within the JS UDF fhoffa.x.fuzzy_extract_one(). See https://medium.com/#hoffa/new-in-bigquery-persistent-udfs-c9ea4100fd83 to learn more about this.
WITH data AS (
SELECT name, gender, SUM(number) c
FROM `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY 1,2
), top_men AS (
SELECT * FROM data WHERE gender='M'
ORDER BY c DESC LIMIT 200
), top_women AS (
SELECT * FROM data WHERE gender='F'
ORDER BY c DESC LIMIT 200
)
SELECT name male_name,
COALESCE(
(SELECT name FROM top_women WHERE name=a.name)
, fhoffa.x.fuzzy_extract_one(name, ARRAY(SELECT name FROM top_women))
) female_version
FROM top_men a

SUM column values based off differing results of separate column

Consider a relation that contained the names and number of locations of restaurants including split and stand alone restaurants:
RESTAURANT: NUM_OF_LOC
Pizza Hut 1
Pizza Hut/Taco Bell 2
Taco Bell 2
Also consider you will not know the name of the restaurant, stand alone or split, or Number of Locations. The only consistent piece is the "/" string character between split restaurants.
How to return the above table as a result with the number of stand alone restaurants summed into the number of split restaurants in desc, like so:
RESTAURANT: NUM_OF_LOC
Pizza Hut/Taco Bell 5
Taco Bell 2
Pizza Hut 1
So are you looking to get the count of all restaurants for just Taco Bell and Pizza Hut where a joint counts as 1 for each or are you looking to count all occurrences of each variant?
I'm thinking you aren't just looking for totals and are looking to tear apart the combined restaurants so you can do something like
SELECT name, count(*)
FROM restaurants
WHERE CONTAINS (name, 'Taco Bell')
Relooking it seems like you want the consolidated groups to include all occurrences of either which would be something like:
CREATE TABLE sums AS
SELECT name, count(*)
FROM restaurants
WHERE CONTAINS (name, 'Taco Bell')
OR CONTAINS(name, 'Pizza Hut')

Get the most common item in a calculated column

I've figured out how to accomplish the equivalent of the following in a measure, but I need to use it as the legend in a Power View chart so it needs to be done in a calculated column. The change of context from calculated field to calculated column has completely screwed me up.
In my data model, I have a table of job applications. Each record has a single Specialty and an address for the company being applied to. Each specialty can show up multiple times in the table.
ApplicationTable:
ApplicationID | Name | Specialty | City | State
32911 |Joe Bob | Engineering | Miami | Florida
89547 |Ralph Kramden | Shouting | New York | New York
etc.
I also have a table of states. It just has columns for state name and postal abbreviation. I need to create a column with the most commonly occurring Specialty per state.
If I could do this as a calculated field, I would have been finished hours ago. I just used a pretty straightforward application of topn:
Top Specialty := FIRSTNONBLANK (TOPN (3, VALUES (ApplicationTable[Specialty]),[Count of ApplicationID], ApplicationTable[Specialty])
I used FIRSTNONBLANK and TOPN(3...) because some states only have a few applications, so each specialty only shows up once. In my application it's fine to just pick the first specialty in the list in those cases.
Anyway that formula is cool but it doesn't help here. So how do I do the equivalent in a calculated column, so I can use it as a key or a filter? Specifically, I think I need to do this in the StateTable, giving me the name of the specialty that occurs most per state in the ApplicationTable. Ideas?
First, create a basic measure to count specialties:
SpecialtyCount :=
COUNTA ( ApplicationTable[Specialty] )
Next, create a measure to figure out the highest single specialty (within a context):
MostSpecial :=
MAXX ( VALUES ( ApplicationTable[Specialty] ), [SpecialtyCount] )
Finally, add a calculated column to your States table:
=
FIRSTNONBLANK (
ApplicationTable[Specialty],
IF (
[SpecialtyCount]
= CALCULATE ( [MostSpecial], VALUES ( ApplicationTable[Speciality] ) ),
1,
BLANK ()
)
)
By placing this as a calculated column, our filter context is each state. So first PowerPivot will filter the ApplicationTable to just applications within the state, and then it will use FIRSTNONBLANK() to iterate through each ApplicationTable[Specialty], calculate its SpecialtyCount and see if that equals the MostSpecial count within that state. If so, it's not blank, and that's the specialty it returns.

MS Access Small Equivalent

I have this working in Excel however it really needs moved into Access as that's where the rest of the database resides.
Its simply one table that contains Unique_ID, Seller and Fruit...
1 Chris Orange
2 Chris Apple
3 Chris Apple
4 Sarah Kiwi
5 Chris Pear
6 Sarah Orange
The end results should be displayed by Seller and then a list of each fruit sold (in the following example Robert has not sold any fruit, I do have a list of all sellers name however this could be ignored in this example as that I believe that will be easy to integrate.) They will only sell a maximum of 20 fruit.
Seller 1st 2nd 3rd 4th
Chris Orange Apple Apple Pear
Sarah Kiwi Orange
Robert
At the moment Excel uses Index, Match and Small to return results. Small is simply used on the Unique_ID to find the 1st, 2nd, 3rd, ect...smallest entries and is matched to each sellers name to build the above results.
As Access doesn't have a Small function I am at a loss! In reality there are over 100,000 records (minimum) with over 4000 sellers....they are also not fruit :)
TRANSFORM First(Sales.Fruit) AS FirstOfFruit
SELECT Sales.Seller
FROM Sales
GROUP BY Sales.Seller
PIVOT DCount([id],"sales","seller='" & [seller] & "' and id<=" & [id]);
Where the table name is "Sales" and the columns are "ID", "Seller" and "Fruit"
To understand DCount better, use it is a SELECT query instead of a crosstab:
SELECT Sales.ID, Sales.Seller, Sales.Fruit, DCount([id],"sales","seller='" & [seller] & "' and id<=" & [id]) AS N
FROM Sales;
On each row, the last column is the DCount result. The syntax is DCount (field, source, expression) so what it does is count the IDs (field) in the Sales table (source) that match the expression - in other words, has the same seller as that row's record and an ID <= the current row's ID. So for Chris's sales, it numbers them 1 through 4, even though Sarah had a sale in the middle.
From this result, it's easy to take a Crosstab query that makes a table with seller in the row and N in the column - putting the sales in order for each seller the way you wanted to see them. The "First" function finds the first fruit for the combination of seller and N for each row and column of the result. You could just as easily use "Max" or "Min" here - any text function. Of course, there is only one record matching the seller row and the N column, but Crosstab queries require a function to evaluate and cannot use "Group by" for the field selected as a Value.
My 1st answer combines these steps - the select and the crosstab queries - in one query.
Hope this helps.

Microsoft Access 2010 - Updating Multiple Rows with Different values in ONE query

I have a question about updating multiple rows with different values in MS Access 2010.
Table 1: Food
ID | Favourite Food
1 | Apple
2 | Orange
3 | Pear
Table 2: New
ID | Favourite Food
1 | Watermelon
3 | Cherries
Right now, it looks deceptively simple to execute them separately (because this is just an example). But how would I execute a whole lot of them at the same time if I had, say, 500 rows to update out of 1000 records.
So what I want to do is to update the "Food" table based on the new values from the "New" table.
Would appreciate if anyone could give me some direction / syntax so that I can test it out on MS Access 2010. If this requires VBA, do provide some samples of how I should carry this out programmatically, not manually statement-by-statement.
Thank you!
ADDENDUM (REAL DATA)
Table: Competitors
Columns: CompetitorNo (PK), FirstName, LastName, Score, Ranking
query: FinalScore
Columns: CompetitorNo, Score, Ranking
Note - this query is a query of another query, which in turn, is a query of another query (could there be a potential problem here? There are at least 4 queries before this FinalScore query is derived. Should I post them?)
In the competitors table, all the columns except "Score" and "Ranking" are filled. We would need to take the values from the FinalScore query and insert them into the relevant competitor columns.
Addendum (Brief Explanation of Query)
Table: Competitors
Columns: CompetitorNo (PK), FirstName, LastName, Score, Ranking
Sample Data: AX1234, Simpson, Danny, <blank initially>, <blank initially>
Table: CompetitionRecord
Columns: EventNo (PK composite), CompetitorNo (PK composite), Timing, Bonus
Sample Data1: E01, AX1234, 14.4, 1
Sample Data2: E01, AB1938, 12.5, 0
Sample Data3: E01, BB1919, 13.0, 2
Event No specifies unique event ID
Timing measures the time taken to run 200 metres. The lesser, the better.
Bonus can be given in 3 values (0 - Disqualified, 1 - Normal, 2 - Exceptional). Competitors with Exceptional are given bonus points (5% off their timing).
Query: FinalScore
Columns: CompetitorNo (PK), Score, Ranking
Score is calculated by wins. For example, in the above event (E01), there are three competitors. The winner of the event is BB1919. Winners get 1 point. Losers don't get any points. Those that are disqualified do not receive any points as well.
This query lists the competitors and their cumulative scores (from a list of many events - E01, E02, E03 etc.) and calculates their ranking in the ranking column everytime the query is executed. (For example, a person who wins the most 200m events would be at the top of this list).
Now, I am required to update the Competitors table with this information. The query is rather complex - with all the grouping, summations, rankings and whatnots. Thus, I had to create multiple queries to achieve the end result.
How about:
UPDATE Food
INNER JOIN [New]
ON Food.ID=New.ID
SET Food.[Favourite Food] = New.[Favourite Food]