Business Objects WEBI 3 universes in one report - sap

this is bugging me since yesterday and I cannot work it out, hope you can help:
I have a report in Business Objects (I use WEBI interface) which has data being pulled from 3 different universes: universes 1 and 2 have sold qty information per customer number, universe 3 query is just a list of customer numbers and their relevant region. All 3 universes have same customer numbers therefore i merged on that field.
The first task which i have been able to achieve is: i created a table that checks if there is sold qty for a particular customer in universe 1 then this qty is shown for this customer, if not, show qty for this customer from universe 2.
What i am struggling with is: i need to add the customer region field from the third universe into the same report looking up the customer number and returning the customer region field, however the problem is - there ARE some customer numbers that are missing from the third universe and on adding that field i lose this sold qty entries for this customer from the table completely. However i would like to still see them in a report with "Null" values for the region.
I have searched for a similar solution in different sources across internet, however i couldn't find anything similar(having 3 universes in same report) as all solutions offered cover only 2 universes, which i could have replicated myself if needed.
Is this achievable?

Merging dimensions functions as a "join" between the data providers involved. Whether it acts as an "outer join" (roughly speaking) or an "inner join" within a block depends on the types of objects you are combining.
Imagine you have two data providers, DP1 and DP2. They can be from different universes or from the same; what matters is that there is a common dimension which can be merged between them.
DP1 selects dimension "Customer Number", along with other objects. DP2 selects dimension "Customer Number", dimension "Customer Region", and measure "Quantity Sold". Dimension "Customer Number" is the common dimension in the two data providers and will be merged, but DP2 does not contain all of the values which are present in DP1 (in the interest of simplifying the example, let's say DP1 does contain all of the values in DP2).
Including the merged dimension "Customer Number" and "Quantity Sold" in the same block will return all of the customer numbers in both data providers, with blank values for "Quantity Sold" for missing values in DP2. This is the equivalent of an outer join, and whether it is a left, right, or full outer join depends on other options, which are well described here:
http://www.dagira.com/2010/06/19/what-does-extend-merged-dimensions-really-do/
Including the merged dimension "Customer Number" and "Customer Region" in the same block will restrict customer numbers to only those found in DP2. This is the equivalent of an inner join, and can present other limitations like incompatible objects. You may need a detail object in your example, if you can adapt the universe; some more useful explanations are here:
https://michaelwelter.wordpress.com/2011/04/18/tips-for-merging-dimensions/
Therefore I think this problem is not related to having three universes in the same document, but rather the types of objects you are selecting from each.

Related

How to populate all possible combination of values in columns, using Spark/normal SQL

I have a scenario, where my original dataset looks like below
Data:
Country,Commodity,Year,Type,Amount
US,Vegetable,2010,Harvested,2.44
US,Vegetable,2010,Yield,15.8
US,Vegetable,2010,Production,6.48
US,Vegetable,2011,Harvested,6
US,Vegetable,2011,Yield,18
US,Vegetable,2011,Production,3
Argentina,Vegetable,2010,Harvested,15.2
Argentina,Vegetable,2010,Yield,40.5
Argentina,Vegetable,2010,Production,2.66
Argentina,Vegetable,2011,Harvested,15.2
Argentina,Vegetable,2011,Yield,40.5
Argentina,Vegetable,2011,Production,2.66
Bhutan,Vegetable,2010,Harvested,7
Bhutan,Vegetable,2010,Yield,35
Bhutan,Vegetable,2010,Production,5
Bhutan,Vegetable,2011,Harvested,2
Bhutan,Vegetable,2011,Yield,6
Bhutan,Vegetable,2011,Production,3
Image of the above csv:
Now there is a very small country lookup table which has all possible countries the source data can come with, listed. PFB:
I want to have the output data's number of columns always fixed (this is to ensure the reporting/visualization tool doesn't get dynamic number columns with every day's new source data ingestions depending on the varying distinct number of countries present).
So, I've to somehow join the source data with the country_lookup csv and populate all those columns with default value as F. Every country column would be binary with T or F being the possible values.
The original dataset from the above has to be converted into below:
Data (I've kept the Amount field unsolved for column Type having Derived Yield as is, rather than calculating them below for a better understanding and for you to match with the formulae):
Country,Commodity,Year,Type,Amount,US,Argentina,Bhutan,India,Nepal,Bangladesh
US,Vegetable,2010,Harvested,2.44,T,F,F,F,F,F
US,Vegetable,2010,Yield,15.8,T,F,F,F,F,F
US,Vegetable,2010,Production,6.48,T,F,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
US,Vegetable,2011,Harvested,6,T,F,F,F,F,F
US,Vegetable,2011,Yield,18,T,F,F,F,F,F
US,Vegetable,2011,Production,3,T,F,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+2)/(3+3),T,F,T,F,F,F
US,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Argentina,Vegetable,2010,Harvested,15.2,F,T,F,F,F,F
Argentina,Vegetable,2010,Yield,40.5,F,T,F,F,F,F
Argentina,Vegetable,2010,Production,2.66,F,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Argentina,Vegetable,2011,Harvested,10,F,T,F,F,F,F
Argentina,Vegetable,2011,Yield,90,F,T,F,F,F,F
Argentina,Vegetable,2011,Production,9,F,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Bhutan,Vegetable,2010,Harvested,7,F,F,T,F,F,F
Bhutan,Vegetable,2010,Yield,35,F,F,T,F,F,F
Bhutan,Vegetable,2010,Production,5,F,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Bhutan,Vegetable,2011,Harvested,2,F,F,T,F,F,F
Bhutan,Vegetable,2011,Yield,6,F,F,T,F,F,F
Bhutan,Vegetable,2011,Production,3,F,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
The image of the above expected output data for a structured look at it:
Part 1 -
Part 2 -
Formulae for populating Amount Field for Derived Type:
Derived Amount = Sum of Harvested of all countries with T (True) grouped by Year and Commodity columns divided by Sum of Production of all countries with T (True)grouped by Year and Commodity columns.
So, the target is to have a combination of all the countries from source and calculate the sum of respective Harvested and Production values which then has to be divided. The commodity can be more than one in the actual scenario for any given country, but that should not bother as the summation of amount happens on grouped commodity and year.
Note: The users in the frontend can select any combination of countries. The sole purpose of doing it in the backend rather than dynamically doing it in the frontend is because AWS QuickSight (our visualisation tool), even though can populate sum on selected column filters but doesn't yet support calculation on those derived summed fields. Hence, the entire calculation of all combination of countries has to be pre-populated (very naive approach) in order to make it available in report on dynamic users selection of countries.
Also if you've any better approach (than the above naive approach mentioned in note) to solve this problem, you are most welcome to guide me. I've also posted a question on the same problem without writing my expected approach for experts to show me the path on how we can solve this kind of a problem better than this naive approach. If you want to help solve it with some other technique, you're most welcome, here is the link to that question.
Any help shall be greatly acknowledged.

How to show dimension members for which there are no values in fact in SSAS Multi-Dimensional?

I have two tables in my Ticket Management Application, "ExpositionPeriods" and "OrganisedVisits".
ExpositionPeriods - Defines the periods for which tickets can be purchased.
OrganisedVisits - Stores the tickets purchased information.
In the example below, we have 5 periods available, and tickets have been purchased for 2 of the periods.
The customer wants a report which shows "Number of visitors against each available period". That means if, for any period which doesn't have a visitor, the report should show "0" for that period. Something like this.
So far so good. Since the production database is humongous (~500 GB), it is not advisable to report on this database directly. Things turn to be challenging when I create an OLAP cube out of this schema and try to achieve the same report functionality in the cube. It seems the cube actually performs an action similar to SQL INNER JOIN as opposed to a LEFT OUTER JOIN and hence I do not see those Periods for which there are no tickets sold.
Is this how SSAS actually behaves? Am I missing out any particular setting that will indicate the SSAS engine to process the cube in a different manner so as to include the missing periods as well? Please note, end customers don't have access to MDX/DAX scripts, they can only use the cube by drag-drop measure and dimensions like in Excel pivot table.
In your image the browser is carrying out a non empty on rows on the date dimension. If you want to show the dates with no visitors then select the option to show empty cells.
You can define such scope in "calculations" as
SCOPE
([Measures].[Visitors]);
THIS=IIF(ISEMPTY([Measures].[Visitors]),0,[Measures].[Visitors]);
END SCOPE;
So you have zeros instead of nulls and side effect. You or customer can't hide empty cells, cause now it's not empty enough.

Excel - extracting values from data table for list validation

Background:
I have a workbook/tool that performs exactly as I wish.
The tool includes two main components:
A list of ingredients grouped by category (i.e., dairy, produce, etc.)
a sheet with recipe tables
On the ingredient sheet, all ingredients of a specific category are grouped together as part of a named range (i.e. "dairy_list").
[Named Rage: Dairy_list]
[Ingredient] [Unit] [Price]
Milk, 2% 1 litre $2.50
Milk, Whole 1 litre $3.50
[/Named Rage: Dairy_list]
[Named Rage: Fruit_list]
[Ingredient] [Unit] [Price]
Apples pound $0.99
Bananas pound $1.99
[/Named Rage: Fruit_list]
Recipe tables include a row for each ingredient (note: there may be hundreds of ingredient rows once the model is complete). In each row are two columns, from the first contains a list validated drop down menu containing the names of all categories, the second is a list validated drop down containing all ingredients that reside in the selected category. I do this using the named ranges and the INDIRECT([category cell]&"_list") function for the list validation. So if the category "Dairy" were selected, the validation list would be the range "Dairy_list".
This works; however, some challenges exist from a usability stand point.
Challenges:
The users of this tool are not experienced with Excel. Users must be able to easily add rows to each category and add categories to the model.
Explaining how to define a named range, insert rows within a range and ensuring ranges' naming conventions are valid are all relatively simple tasks for experienced users but not for many of the intended users of the tool.
Users also may wish to sort the ingredient list in different ways (using auto filter), which is not possible when the table is divided into subgroups and categories.
I am rebuilding the model template to allow for this usability and to simplify the user experience.
Objective:
The format of the ingredient data table will be single, sortable table:
[Category] [Ingredient]
Fruit Apple
Fruit Banana
Vegetable Carrot
Fruit Melon
A named range of possible categories will be contained on a separate sheet.
Ideally, the user need only add a category to the category list or add an ingredietn to the ingrdient list, with no further steps required other than to populate recipes.
The user should be able to add (within reason) as many categories as they want. Similarly, on the ingredient sheet, the user may add (within reason) as many ingredients for each category and in any order they wish. This is important because the solutions I have found on this site and others for dynamic validation lists typically involve creating an additional list/named range that groups list items by category then applies a named range to each list.
[Fruit] [Vegetables]
Apple Carrot
Banana
Melon
This approach, while functionally relevant, is essentially a less elegant version (the data must reside in multiple locations) of what my current model uses and one which I am trying to avoid (I want to avoid as many steps as possible for the user).
What I wish the model to do in the recipe tables is, wherever the category cell contains [Fruit] the ingredient cell only lists items from the ingredient list where the [category] is Fruit. But not require the data to be manually extracted or grouped within the ingredient list.
I've experimented with VBA auto-filtering, but I do not want to alter the filter, format or ordering settings on the ingredient list if the user has sorted a certain way for usability.
If I were constructing this in another programming language that referenced a database, the equivalent functionality would be a SQL statement like "SELECT [ingredient] FROM [Ingredient_List] WHERE [Category] = 'Fruit'".
I am open to VBA or non-VBA solutions (preferably ones that are relatively backward compatible as I do not have control over Excel version).
Thank you in advance for any thoughts/direction/resources/solutions.

How can I summarize and reuse a complex dataset

How can I re-use a single complex dataset across a number of tables?
The dataset has a number of computed columns that needs to be reported both in detail and in summary. Here's a very simplified example dataset:
is_food sale_association food_type total_sold total_associations percent_total
1 Before Movie Popcorn 50 3 x BirtMath.safeDivide(...)
0 Before Movie Soda 10 2 x BirtMath.safeDivide(...)
1 During Movie Jujubee 10 1 x BirtMath.safeDivide(...)
0 After Movie Soda 15 2 x BirtMath.safeDivide(...)
From this one dataset, I'd want to create a detailed summary of all food types while rolling up non food (using the 'is_food' column), another summary of all food types, another detailed summary of food with rolled up non-food by sale_association, etc. etc.
The report would also contain a number of percentages (6 in the most complex table) that need to be calculated (some across a row, others across all rows in a given group), all of which can have a zero value for the denominator and so need to be guarded against with safeDivide (which is a PITA to do in the source SQL query which itself is doing aggregation -- checking for divide by zero when both the numerator and denominator are sums leads to hairy queries).
Obviously I can do this by focusing the() SQL query as appropriate, but it seems like a waste of time and effort to create 12 or 15 queries that are very similar when I've already managed to create the monster query for the most detailed table.
What doesn't seem straightforward is how to perform the rollups in a table. I managed to hack something together by hiding rows that would later be summed up (e.g. "is_food == 0" in the example) and then creating custom data bindings that are displayed in a footer row. Not only does it feel like a hack, it also interferes with the ability to naturally order rows. Again, going back to the example, if I was ordering by total_sold and summarizing rows with is_food == 0, the natural order should be Popcorn, Non-food, Jujubee.
There's nothing in the BIRT wiki about this, nor does "BIRT: A Field Guide, 3rd E." really delve into the topic.
This seems like a fairly open-ended question (although I agree that re-using a single dataset makes much more sense than having multiple queries retrieving the same data in slightly different ways). A few general suggestions:
Use the most detailed version of the data required as a common dataset for each BIRT report item (typically BIRT tables)
Where summary-only level reporting is required, add groups to the BIRT table at the desired level, add data items as required to the group headers/footers and delete the detail level row(s) from the BIRT table.
Where detail-level reporting is required in some cases (eg. for food items but not for non-food items), add groups to the BIRT table as above, and set the visibility of the detail row (in Property Editor - Properties - Visibility) to check Hide Element, then specify the appropriate expression to suppress the non-required rows (non-food items, in this example).
Aggregations (ie. summary expressions) can be added to tables by selecting the whole table, selecting the Binding tab within the Property Editor and clicking the Add Aggregation... button.

Have 2 separate tables or an additional field in 1 table?

I am making a small personal application regarding my trade of shares of various companies.
The actions can be selling shares of a company or buying. Therefore, the details to be saved in both cases would be:
Number of Shares
Average Price
Would it be better to use separate tables for "buy" and "sell" or just use one table for "trade" and keep a field that demarcates "buy" from "sell"?
Definitely the latter case - one table, simple one field (boolean) defining whether it's selling or buying. You should define tables by entities, not by actions taken on them.
This is actually a tricky one. The table you're talking about is basically a trade table, detailing all your buys and sells.
In that sense, you would think it would make sense to have both buys and sells in a single table.
However, in many jurisdictions, there is extra information for a sell order. That piece of information is which buy order to offset it against (for capital gains or profit purposes). While this is not necessary in a strict first-bought, first-sold (FBFS) environment, that's by no means the only possibility.
For example, under Australian law, you can actually offset a sale against your most recent purchase, as long as you have the rationale written down in clear language before-hand. Even though my company follow FBFS, I am allowed to receive bonus issues or supplemental shares which I can then sell immediately. These are offset against the most recent shares bought, not ones I've held for X number of years (this is often handy to minimise taxes payable).
If you follow a strict FBFS, then you don't need that extra information and your trades are symmetrical. Even where they're not, I've implemented it in one table with the extra information, useless for buy orders of course. That seemed the easiest way to go.
You could do it as two asymmetrical tables but that makes queries a bit more problematic since you often need to pull data from both tables. My advice is to stick with a single table with the extra information if needed.
I would also never store the average price. I prefer the quantity, the price per share and the brokerage costs. Every other figure can be calculated from those three, for example:
AvgPrice = (Brokerage + SharePrice * ShareQuant) / ShareQuant
but it's sometimes impossible to work backwards from just the average price, since you don't know what the brokerage was.
And I wouldn't have a boolean for buy/sell, it's just as easy to use negative numbers for the sell orders and it makes balance-sheet-type calculations a lot easier since you just sum values irrespective of the order type instead of needing to negate some of them depending on that order type.
Update: If, as you seem to indicate, you're only going to store aggregate information for each company, I would go for the following:
Companies:
CompanyId primary key
CompanyCode indexed
CompanyName
CompanyBuyQuant
CompanyBuyAvgPrice
CompanySellQuant
CompanySellAvgPrice
then you update the individual columns depending on whether it's a buy or sell. You don't need a separate row for the buy/sell stuff. When the company is first added, both quantities and prices are set to 0.
Your entity is now the company so this makes more sense. One thing you may want to consider is to store the aggregate values of shares bought and sold rather than the average buy and sell prices. That will simplify your update calculations and you can still easily get the averages by dividing the aggregate by the quantity.
So, the following table:
Companies:
CompanyId primary key
CompanyCode indexed
CompanyName
CompanyBuyQuant
CompanyBuyValue
CompanySellQuant
CompanySellValue
When adding a company, set all quanities and values to 0,
When buying M shares at N dollars each, add M to CompanyBuyQuant and N * M to CompanyBuyValue.
When selling M shares at N dollars each, add M to CompanySellQuant and N * M to CompanySellValue.
Get average buy price as CompanyBuyValue / CompanyBuyQuant.
Get average sell price as CompanySellValue / CompanySellQuant.
I'd go with a single table.
You can use negative quantities to indicate a sell. This is a fairly standard sort of indication. Subtraction is the same as adding a negative number!
One table. Each row/item is a trade, whether it's buy or sell.
Also, the aggregate of the quantity column will give you your current position. And cash too (-1 x quantity x price**) aggregated.
Buy or sell if inferred by the sign of the quantity: no need for separate column, unless you want to make a computed column derived from quantity.
**cash: When you sell (negative quantity) you get cash back (positive cash), hence -1 multiplier in case anyone wonders.
"Trade" can be ambiguous and it's not entirely clear to me what you want to do here. Are you interested in storing only your current position in each share or also the history of transactions that show how the position developed?
If you just want to record your holding ("position" might be a better word if you can be short) then I'd simply record for each share the number held. You mention average price, but I'd be cautious about that if you expect at any time to be able to sell part of a holding. What's the average price if you buy 100 at 50, 100 at 60 and sell 50 at 70?
Unless you expect your buy and sell transactions to number in the millions, I'd be more inclined to record each individual purchase or sale as a separate row in a single table and show the totals on demand as the derived results of a simple query.