Excel VBA - Random extra grouping levels - vba

This is not really an issue that affects the code but rather a question of the table's appearance.
So, the table is the summary of records for income and expenses of different business departments. Let's call each department a type of the record. Each of those types has subtype1. Each subtype1 has subtypes2 and each subtype2 has subtypes3.
So the sample data would be something like this.
1, Type1, sum of subtypes1
1.1, Subtype1, sum of subtypes2
1.1.1 Subtype2, sum of subtypes3
1.1.1.1 Subtype3, amount
1.1.1.2 Subtype3, amount
1.2, Subtype1, sum of subtypes2
1.2.1, Subtype2, sum of subtypes3
1.2.1.1, Subtype3, amount
Each subtype can have different number of "children subtypes". Children subtypes can't go further than subtype3.
Then I am using VBA script to group the records of the same subtype under their direct parent up to the main type. Everything works fine, I can expand or hide every single level of this structure.
However, logically the group outline on the left side of the table for rows should show 4 levels. Instead it shows 8 levels of groups. First 4 do exactly what you would expect, show or hide respective subtypes while the other 4 levels do absolutely nothing which is also expected because I don't see a reason for them to be there.
Any ideas why extra levels have been created and how to get rid of them?
I might have explained this in a not very clear way so feel free to ask for further information.

Try stepping through your code in trace mode to watch the groups being set up. (open the VBA window and use the F8 key to loop one line at a time)
This may reveal why the extra groups are being defined and suggest what to change.

Related

TABLEAU Totals not matching what's in view

I've been dealing with this issue in various ways throughout my time with this dataset in Tableau.
As you can see, the Total count of properties for each city is including properties that have been successfully filtered out of view. Why? The dyn.RANKED Profitable Investments (grouped) variable on the Filter shelf is an attempt to double down on the same as the first line of the Calculated Field - to ignore the unwanted properties in each city. The view ignores them, but the totals do not.
If the Watershed Property pill is removed from the Rows shelf, then the dyn.NumProps_in_City results shown on the table are each the same as the Totals you see here (i.e., despite the first line of the calculated field, properties that do not meet that opening condition are being counted)...despite the view with the Watershed pill knowing not to show them.
Also if the Watershed Property pill is removed from the Rows shelf, then the dyn.RANKED Profitable Investments (grouped) variable on the Filter shelf suddenly only has one category to choose from (i.e., 'INVEST') if you go to edit the filter. Which would be great since that's the category I care about, but not if the counts are including things that are not in that category despite the filter.
Messing around with Include, Exclude, and Fixed in the calculated field doesn't seem to work here since I can't figure out how to get around various aggregate/non-aggregate and/or ATTR errors no matter where I place them. Plus, my incorrect counts are not suffering from an LOD issue - the LOD is correct - it's an issue of not consistently filtering out the unwanted rows at the desired LOD.
Please advise!
Thanks,
Christian
It seems that dyn.Ranked calculated field calculates the value prior to filtering. This may happen if you have used any LOD calculations in the syntax.
Simply right click such fields on filters shelf and click add to context. This will cause LOD calculations to calculate after the filtering.
see this link, the context filters are above the LOD calculations, in order of precedence; but measure filters are below the LOD calcs. Therefore if measures are used as filters, these have to be added to context so that their order of precedence is above such calculations

Can someone explain the following Essbase code: FIX, #relative

Can someone please explain the below Essbase code to me please? This is my first time looking at any Essbase code and I'm getting a bit confused as to what it is actually doing.
FIX(&Mth, &Yr, &Version,
"Sector1","Sector2", #relative("Source Code",0), #relative("Channel", 0) )
FIX("AccountNo","DepNo")
DATACOPY "1A11"->"A-500" TO "1BCD"->"C-800";
ENDFIX
ENDFIX
From what I have googled the following is my understanding:
Creates a new command block which restricts database calculations to this subset.
Passes the following members into the command to be used:
Mth
Yr
Version
Returns the following fields:
Sector1
Sector2
returns the 0-level members of the Source Code member - meaning it returns the members of the Total Source Code without children (no other dimensions)
returns the 0-level members of the Channel member - meaning it returns the members of the Channel without children (no other dimensions)
Begins a new command block and passes the following members into the command to be used:
AccountNo
DepNo
Copies the range of cells 1A11, A-500 over to the range 1BCD, C-800
The above is what I understand from the oracle documents on each of the functions, but I can't actually figure out what is happening.
Welcome to the world of Essbase; it can be a little daunting at first especially if you're new to the concept of multidimensionality. You are on the right track regarding analyzing your calc script.
Try not to think of the FIX statement as a command block, per se. A FIX is used to select a portion of cells in the cube. Every single piece of data in your cube has a particular address that consists of one member from every dimension, plus the actual data value itself. For instance, a cube with the dimensions Time, Year, Scenario, and Location might have a particular piece of data at Jan->2018->Actual->Washington. The number of possible permutations of data in a cube can quickly get very large. For instance, if you're organization has 4 years of data, 12 months in a year, 100 locations, 10000 accounts, 3 versions, and 10 departments, you are talking about 4 * 12 * 100 * 10000 * 3 * 10 = 1.4 billion different potential addresses (cells) of data – and that's actually fairly small for a cube, as they tend to grow much larger.
That said, FIX statements are used to narrow down the scope of your calculation operation, rather than operating on the ENTIRE cube (all 1.4 billion cells in my hypothetical example), the FIX essentially restricts the calculation to cells that match certain criteria you specify. In this case, the first FIX statement restricts the calculation to a particular month, yr, version, sectors, sources, and channels. Note that the ampersand on Mth, Yr, and Version means that a substitution variable is to be used. This means your server or cube has a substitution variable value set, such as the variable Mth = "Jan" and Yr = "FY2018" and Version might be "Working" or "Final" or something similar. I would guess that Sector1 and Sector2 are possibly two different members from the same dimension. #RELATIVE("Source Code", 0) is a function that finds the level-0 members (leaf/bottom-level members in a dimension, that is, members that do not have children below them) of the specified member.
In other words, the first FIX statement is narrowing the scope of the calculation to a particular month in a particular year in a particular version (as opposed to all months, all years, all versions), and for that particular month/year/version (for either Sector1 or Sector2) it is fixing on all of the level-0/bottom/leaf members in Source Code and Channel dimensions.
The next FIX statement just further narrows the current scope of cells to calculate on in addition to the outer FIX. It's not uncommon to see FIX statements nested like this.
Lastly we get to the part where something actually happens: the DATACOPY. In the given FIX context, this DATACOPY command is saying that for EACH cell in the current FIX, copy values from the source to the destination. DATACOPY is a little more straightforward when it's just DATACOPY "Source" TO "Target" as opposed to using the inter dimensional operator (->)... but this is perhaps more easily understood in terms of the time/year dimensions. For example, imagine the data copy was written like this:
DATACOPY "FY2018"->"Dec" TO "FY2019"->"Jan";
In this DATACOPY I'd be telling Essbase that for the given FIX context I would like to copy values from the end of the year (data values where the year is FY2018 AND the month is Dec) to the beginning of the next year (data values where the year is FY2019 AND the month is Jan). Your DATACOPY is working in a similar fashion, but using cost centers or something else. It all just depends on how the cube is setup.

Merge two CSV and collate data

I have two CSV files, the first like so:
Book1:
ID,TITLE,SUBJECT
0001,BLAH,OIL
0002,BLAH,HAMSTER
0003,BLAH,HAMSTER
0004,BLAH,PLANETS
0005,BLAH,JELLO
0006,BLAH,OIL
0007,BLAH,HAMSTER
0008,BLAH,JELLO
0009,BLAH,JELLO
0010,BLAH,HAMSTER
0011,BLAH,OIL
0012,BLAH,OIL
0013,BLAH,OIL
0014,BLAH,JELLO
0015,BLAH,JELLO
0016,BLAH,HAMSTER
0017,BLAH,PLANETS
0018,BLAH,PLANETS
0019,BLAH,HAMSTER
0020,BLAH,HAMSTER
And then a second CSV with items associated with the first list, with ID being the common attribute between the two.
Book2:
ID,ITEM
0001,PURSE
0001,STEAM
0001,SEASHELL
0002,TRUMPET
0002,TRAMPOLINE
0003,PURSE
0003,DOLPHIN
0003,ENVELOPE
0004,SEASHELL
0004,SERPENT
0004,TRUMPET
0005,CAR
0005,NOODLE
0006,CANNONBALL
0006,NOODLE
0006,ORANGE
0006,SEASHELL
0007,CREAM
0007,CANNONBALL
0007,GUM
0008,SERPENT
0008,NOODLE
0008,CAR
0009,CANNONBALL
0009,SERPENT
0009,GRAPE
0010,SERPENT
0010,CAR
0010,TAPE
0011,CANNONBALL
0011,GRAPE
0012,ORANGE
0012,GUM
0012,SEASHELL
0013,NOODLE
0013,CAR
0014,STICK
0014,ORANGE
0015,GUN
0015,GRAPE
0015,STICK
0016,BASEBALL
0016,SEASHELL
0017,CANNONBALL
0017,ORANGE
0017,TRUMPET
0018,GUM
0018,STICK
0018,GRAPE
0018,CAR
0019,CANNONBALL
0019,TRUMPET
0019,ORANGE
0020,TRUMPET
0020,CHERRY
0020,ORANGE
0020,GUM
The real datasets are millions of records, so I'm sorry in advance for my simple example.
The problem I need to solve is getting the data merged and collated in a way where I can see which item groupings most commonly appear together on the same ID. (e.g. GRAPE,GUM,SEASHELL appear together 340 times, ORANGE and STICK 89 times, etc...)
Then I need to see if there is any change/deviation to the general results in common appearance when grouped by SUBJECT.
Tools I'm familiar with are Excel and SQL, but I also have PowerBI and Alteryx at my disposal.
Full disclosure: Not homework, or work, but a volunteer project, thus my unfamiliarity with this kind of data manipulation.
Thanks in advance.
An Alteryx solution:
Drag the two .csv files onto your canvas (seen as book1.csv and book2.csv in my picture; Alteryx will create "Input" tools for you.
Drag a "Join" tool on and connect the two .csv files to its inputs; select "ID" as the join field; unselect the "Right_ID" as output since it's merely a duplicate of "ID"
Drag a "Summary" tool on and connect the Join tool's output to the Summary tool's input; select all three of the outputs and add as a "group by"... then add the ID column with a "count"
Drag a browse tool on and connect the summary's output to the browse tool's input.
run the workflow
After all that, click on the browse tool and you should see what is seen in my screenshot: (which is showing just the first ten rows of output):
+1 for taking on a volunteer project - I think anyone who knows data can have a big impact in support of their favourite group or cause.
I would just pull the 2 files into Power BI as 2 separate tables (Get Data / From File). Create a relationship between the 2 tables based on ID (it might get auto-generated). It should be one to many.
Then I would add a Calculated Column to the Book1 table to Concatenate the related ITEM values, eg.
Items =
CALCULATE (
CONCATENATEX (
DISTINCT ( 'Book2'[ITEM] ),
'Book2'[ITEM],
", ",
'Book2'[ITEM], ASC
)
)
Now you can use that Items field in visuals (e.g. a Table), along with Count of ID to get the frequency.
Adding Subject to a copy of the table (e.g. to the Columns well of a Matrix) will produce your grouped scenario, or you could add a Subject Slicer.
As you will be comparing subsets of varying size, I would change Count of ID to Show value as - % of grand total.
Little different solution using Alteryx.
With this dataset, there are very few repeating 3 or 4 item groups. You can do the two item affinity analysis and get a probability of 3 or 4 item groups, or you can count the 3 and 4 item groups individually. I believe what you want is the latter as your probability of getting grapes with oranges may be altered by whether you have bananas in the cart or not.
Anyway, I did not join in the subject until after finding all of my combinations. I found all the combinations by taking the Cartesian join of two, then three, then four of the original set. I then removed all duplicates by ensuring items were always in alphabetical order in each row. I then counted occurrences of each combination. More joins can be added in the same pattern to count groups of 5,6,7...
Once you have the counts of occurrences, then I would join back with the subjects and perform this analysis on each group and compare to the overall results.
I'm supposed to disclose that I work for Alteryx.
first of all if you are using windows
just navigate to the directory which contains the CSV and write the following command:
copy pattern newfileName.csv
#example
copy *.csv merged.csv
now you created one csv file, the file is too large now you can't process it once, depending on your programming language you can use appropriate way, for python you can use generators to process line by line, or pandas you can read chunk by chunk it will be easy.
I hope this help you.

Two Dimensional Diagram with aggr function

I'm having a very curious Problem in QlikView.
I have a number of readouts from a Database which show certain amounts of time in a different state.
In that table there are 49 variables that describe the state, there are 7 levels of i.e the SOC and seven states of the Temperature.
i.e the one of the fields could be named: SOC1_T1 or SOC2_T1 and so on...
So what i get is a table full of readouts in which every i have an specific id for the object, the state of the variables and an age. There are multiple entries per Object.
What i want to do is to plot a two dimensional diagram over all the states so i get SOC over Temperatur Histogram(Average of the maximum (or newest) value of every object).
I tried creating to Dynamic (or syntethic) Dimensions (ValueLoop(1,7) and ValueLoop(1,8).
In the formulas i reffered to them with
=If(ValueLoop(1,7) = 1 and ValueLoop(1,8) = 1,
(avg(aggr(FirstSortedValue (SOC1_T1
, -age), id)) * 100))
and created 49 Formulas with each state variable output.
Problem now is:
It only shows the first entry. I can replace the whole expression in the if condition with a specific number (100) and get a result. I also plotted the inner expression into a Listbox and checked wheter the result is not null.
As soon as I delete the aggr function and just take the AVG over everything (which is not what i want). Everything works fine. When i turn back to aggr, only the first one is shown.
Doesnt help by the way when i delete one of the dimensions, this doesnt work one dimensional either.
Any ideas or workarounds?
Greetings
Julian

How can I summarize and reuse a complex dataset

How can I re-use a single complex dataset across a number of tables?
The dataset has a number of computed columns that needs to be reported both in detail and in summary. Here's a very simplified example dataset:
is_food sale_association food_type total_sold total_associations percent_total
1 Before Movie Popcorn 50 3 x BirtMath.safeDivide(...)
0 Before Movie Soda 10 2 x BirtMath.safeDivide(...)
1 During Movie Jujubee 10 1 x BirtMath.safeDivide(...)
0 After Movie Soda 15 2 x BirtMath.safeDivide(...)
From this one dataset, I'd want to create a detailed summary of all food types while rolling up non food (using the 'is_food' column), another summary of all food types, another detailed summary of food with rolled up non-food by sale_association, etc. etc.
The report would also contain a number of percentages (6 in the most complex table) that need to be calculated (some across a row, others across all rows in a given group), all of which can have a zero value for the denominator and so need to be guarded against with safeDivide (which is a PITA to do in the source SQL query which itself is doing aggregation -- checking for divide by zero when both the numerator and denominator are sums leads to hairy queries).
Obviously I can do this by focusing the() SQL query as appropriate, but it seems like a waste of time and effort to create 12 or 15 queries that are very similar when I've already managed to create the monster query for the most detailed table.
What doesn't seem straightforward is how to perform the rollups in a table. I managed to hack something together by hiding rows that would later be summed up (e.g. "is_food == 0" in the example) and then creating custom data bindings that are displayed in a footer row. Not only does it feel like a hack, it also interferes with the ability to naturally order rows. Again, going back to the example, if I was ordering by total_sold and summarizing rows with is_food == 0, the natural order should be Popcorn, Non-food, Jujubee.
There's nothing in the BIRT wiki about this, nor does "BIRT: A Field Guide, 3rd E." really delve into the topic.
This seems like a fairly open-ended question (although I agree that re-using a single dataset makes much more sense than having multiple queries retrieving the same data in slightly different ways). A few general suggestions:
Use the most detailed version of the data required as a common dataset for each BIRT report item (typically BIRT tables)
Where summary-only level reporting is required, add groups to the BIRT table at the desired level, add data items as required to the group headers/footers and delete the detail level row(s) from the BIRT table.
Where detail-level reporting is required in some cases (eg. for food items but not for non-food items), add groups to the BIRT table as above, and set the visibility of the detail row (in Property Editor - Properties - Visibility) to check Hide Element, then specify the appropriate expression to suppress the non-required rows (non-food items, in this example).
Aggregations (ie. summary expressions) can be added to tables by selecting the whole table, selecting the Binding tab within the Property Editor and clicking the Add Aggregation... button.