One to Many Relationship Analysis Services - ssas

Our current relationship db environment is essentially a sku counting engine. This makes pulling data through sql fairly easy but is causing me grief for developing a cube. The problem I have is that any given sku can map to 1-4 different metrics (ie a mobile phone case will map to the Cases, Accessory, and Gross Profit metrics). My question should I go about setting up my dimensions for this cube?
Invoices
InvoiceID
ProductID
Metrics
ProductID
MetricID
Thanks in advance for any incite that anyone can offer.

Related

what are the pre-requisites and practices for multidimensional cube Designing ( during analysis phase)?

I'm assigned to design multidimensional cube in SSAS.
As I am very new to SSAS, and currently this is in analysis phase.
Just wanted to see , is there any standard process or guideline should I follow or any general questions should I prepare prior to cube designing?
One thing client specifically mentioned about the volume of data as
One service area has 3 million rows, 3 years of data
Does it mean, we should plan for partition strategy ? if yes then what are the things should I be looking ? one thing comes in my mind
what field should we consider to split the cube (am I heading in right direction ?)
What are the other factor should I consider during analysis ?
SSAS design is a large topic with different angels. If i were in your shoes, I'd google for "SSAS Design" or something along those lines to learn more. For example, here's a model chapter from a book provided by Microsoft themselves: https://www.microsoftpressstore.com/articles/article.aspx?p=2812063
I'd skip for partitioning at this stage. See how it performs first and tune it later if really necessary. Usually partitioning is done on some accumulating field , like a date, where old data is not processed daily and only the latest data (partition) is updated (processed). This of course depends on the data you're dealing with.

Factless Fact Table, but with Facts?

Problem: I am working with a SaaS company that provides monthly services. We are trying to create a data model to track customer related metrics such as count, signups, cancellations, and reactivations. I’ve done extensive research online, but the closest I’ve found is accumulating snapshots with start/end dates, which doesn’t make sense with a SaaS company where a customer can reactivate an account.
My initial thought is to create a Factless Fact table for customer, however this factless table would also have keys to event dimension tables, I.e. DimSignupType, DimCancellationType, DimReactivationType, etc and boolean measures for isSignup, isCancellation, and isReactivation. I think this is counterintuitive because a factless fact table shouldn’t have facts, but I need track those and feel multiple fact tables is worse because I would have to join them together in the view.
Is there a better approach to this problem?
Edit based on feedback: The main goal of this is to create a dimensional model that is maintainable, but also something I create a view for with other dimensional tables that allows less technical users to discover insights with tools like Tableau. At the end of the day I need to provide a large flat view with multiple measures and dimensions that allows for easy analytical discovery. Common questions may be, "How many signups do we have MTD for this customer type vs last mtd?", "How many cancelations did we have due to Non-Payment this month compared to last", "How many reactivations from Non-Payment did we have this month compared to last?", etc. A lot of this meta data comes from Dimension tables I would join to the factless fact table based on keys, however it still requires a focus on Signups, Cancellations and Reactivations being tracked as Facts for reporting purposes. So I don't know the best modelling approach for it that abides by traditional standards. It almost seems like a Snapshot Fact Table that contains keys to dimensional tables that describe events to be aggregated. I just don't know what that would be called.
I feel the most flexible solution in terms of data management and ease of use would be a factless fact table modeled in a daily snapshot manner with "facts" for signup, cancellation and, reactivations that link to types.

How to build a statistic model to determine if an update of web site content boosted sales, considering sales have natural growth

The data available includes ERP data for real order quantity and revenue, as well as adobe online analytics data for addin cart and online revenue.
It was asked to determine if an update of content will impact sales, so we have some proof to roll out similar update to all contents. However, the sales by nature will increase. How do we build a model to exclude natual increase sales and provide a statistical proof of increase/decrease by the update?
Thanks,
If I got this right, I come up with two possible solutions:
if the natural grownth is predictable, you should be able to clean it out by approximating. for instance if you have 2% monthly steady sales growth (this can easyly be extracted from the ERP), you can roughly substract them from the results of the updated site. the approach details greatly depend on how presice you wish the model to be
perform A/B site testing. in this case you'll get the real figures. this requires to involve your web team

Fact table design - One-to-many

I come from a relational SQL Server database background, and am trying to make the transition to a multi-dimensional model in Analysis Services.
I'm struggling with how to approach the following problem, which would be incredibly simple in the relational world.
I have 3 tables - Incident, IncidentOffender, and IncidentLoss. There may be none, one, or many IncidentOffenders and IncidentLosses to an Incident:
How can I design my data warehouse such that I will be able to ask the cube, for example, "how much time did we spend dealing with incidents on which a bald offender stole baked beans?", as well as "what was the value of those beans?"?
Apologies if this sounds simple, but I've scoured the web and devoured various books, but still I cannot find a real-life example of anything like this, which seems like an everyday situation to me.
In your scenario, all three tables need to be loaded into SSAS as both dimensions and measure groups. Then the Incident Offender and Incident Loss dimensions can be many-to-many dimensions for the Incident measure group. It will look like the following in the Dimension Usage tab.

Best way to track sales/inventory history for a POS system?

So, I'm writing a POS system, and I want it to be able to keep track of an inventory and generate reports based on past sales.
I'm pretty familiar with database design and that sort of thing, but I'm not quite sure how to approach this particular problem. The first thing I thought was to have tables that track item sales by day, week, month, and year, and then have the program keep track of how much time has elapsed so it knows when to reset these particular records. But now I'm thinking there's got to be a much simpler approach to it than that.
Another thing I thought of doing is to query the sales transaction table based on time stamps, but I'm not sure if that's a step in the right direction either.
I know that there are simpler ways of doing this for things like orders and order history with customers, but what about for the store itself, if they want to track how much product they've sold over the course of a week, month, year, etc? Is it a similar approach? Different? I can't really find anything that speaks to this particular problem.
I would go with your second thought - create a table for transactions with a timestamp, and use the timestamp to do reports (and partitions if necessary). If you know you will be querying by the timestamp very frequently, you can create an index on it to improve performance.
Whether you are tracking customer orders or store sales shouldn't make a difference in the design unless there is some major requirement difference.
Will this be a system where store owners are autonomous or will it be a system with a load of POS terminals that report back to a central hub?
If this is for autonomous store owners you have to start worrying about things like backups and data archiving. Stuff that store owners don't really care about. If you look online you'll probably find some cloud providers that do all this POS stuff for store owners.
On the other hand the general design pattern for larger businesses I have seen is as follows:
On your POS terminals hold minimum required data that is needed at the POS terminal. Minimal reporting is required at the terminal.
Replicate all POS data to a central database server that keeps and merges all different POS terminals. This is your detailed operational reporting. Once data is replicated here it can be deleted from the terminal
Often the store guys aren't too interested in the longer trends but it depends on the business.
Now you can run a report by month or year off the central database server (as can your store owners) and just summarise up to month/year in place. At this point there is no need to create summary tables.
Eventually you'll run into performance issues as data size increases.
The answer to this is not to build summary tables because then your user / reporting system gets complicated because you have to pick the correct table.
The answer is to apply standard performance tuning techniques such as:
Improving server hardware (Just adding RAM often is the most cost effective)
Adding Indexes (including indexed views)
Implementing partitioning
Consider using cubes for reporting
If this is not sufficient you might then want to consider the overhead of batch jobs that populate summary tables. But again Indexed Views can cover this off to a limited extent without requiring summary tables.
You need to understand data sizes, growth and report requirements before considering any design options such as summary tables.