how to deal with many to many dimensions in ssas tabular? - ssas

I have this model:
Emp Roles is an unpivot that has Manager1 and Manager2 from Contracts and Manager1 and Manager2 from Clients:
Each Contract/Client can appear multiple times in Fact.
Each Fact row belongs to only one Contract and only one Client
As expected, a given client can have different contracts… (And, a given contract can belong to multiple clients too).
Therefore clients and contracts is Many to Many too.
My problem is that when in Power BI I combine a manager from Emp Roles with a fact measure I get some wrong results.
Is there any way for this to work?

Related

Should I use a name as primary key if there is nothing else to use, or should i create ids for the entities that don't have anything useable as PK?

I have to specify that this is for a database assignment. I'm pretty good with SQL code but the diagram aspect of the assignment is killing me, I think that every step I take is wrong.
They have given us This scenario and requirements :
A research team has asked you to create a database for a project on movie production
companies; the project aims to use machine learning, neural networks and other
methods to extract information about the situation of movie production companies in
Europe and the health of this sector for a set of specific countries, including the UK.
The data analytics application resulting from this project – which you DO NOT have to
develop; your job is to develop the central, server-side database that underpins it – has been commissioned by a research institute (which shall remain nameless), and it is
intended to be open source, and therefore available to anyone.
Basically, it is a machine learning application that would run on a database with the aim
to identify the correlation between different aspects of the sector, including funding
opportunities and development of new production companies or studios.
The database records every production company in Europe, including the name of the
company, the address, ZIP code, city, country, type of the company (e.g., non-profit
organisation), number of employees and net worth (calculated as total assets minus
total liabilities). Every production company has its name registered with one and only
one local government authority (for example, Companies House in the UK) on a specific
date; each company can have many shareholders. The authority typically requires
information about all the shareholders, including town of birth, mother’s maiden name,
father’s first name, their personal telephone number (only one), national insurance
number (each country in Europe has a similar unique ID), and passport number. Also,
the registration procedure has a cost associated with it (e.g., 12£ in the UK).
The database also records the employees’ data for each company: each employee is
assumed to work for a single production company. Due to the complex structure of
movie production companies and the need for various skills and professions,
employees are categorised into crew and staff. The crew consists of three main groups:
the actors, the director(s) and those who work on other jobs relevant to the filming
(producers, editors, production designers, costume designers, composer, etc.). All
other employees belong to the staff group, including those responsible for HR,
advertising, etc. Employees are identified by an employee ID, first name, last name and
an optional middle name, date of birth and start date. Also, each employee has their
contact details recorded, whether it is a single phone number or multiple, with a
description associated with each of them. Each employee has a single email address,
too.
Members of the crew are paid hourly, and this is recorded in the database as well as a
bonus that depends on their contract. Actors get a bonus for each day of work and
another bonus for each scene completed; directors get a bonus at the end of the
shooting; crew members that work in other jobs relevant to the filming get a bonus at
the end of the shooting, and they have their role recorded as well (e.g., producer or
costume designer).
Staff members have the monthly salary and the working hours (e.g., full time 9-5).
Furthermore, each staff member belongs to a specific department (e.g., advertising),
which is located in a given building at a given address (both recorded in the database).
The database records all movies from each production company. More specifically, for
each movie the following information is recorded: a universal unique movie code(similar to the ISBN for books), the title of the movie, the year and the first release date
(different release dates are not important and should NOT be recorded).
Also, the database records each member of the crew that is part of the movie, and the
role they have in the movie: each crew member can play a single role or multiple roles
in the same movie, and each role has a description associated with it. For example, in
each movie there can be a single protagonist or more than one, the same actor can play
one or several roles, or even have a cameo.
One of the aims of the project is to provide insights on the impact of funding and grants
within the movie industry. To this end, the database should be able to record all the
funding that each production company receives. This must include the name of the
grant, the funding body (e.g., the government of a given country or European Union
grants such as the ERDF), the maximum amount for that grant and the deadline to
submit a proposal.
Then, for each company the database must record the date of the application to a given
grant, the amount requested, the outcome (successful/unsuccessful).
A grant can be given to a single production company or shared among several. Finally,
once the database is ready, the project will run a set of machine learning algorithms to
perform high level data analysis based on the different grants and their corresponding
impact with the aim to investigate the impacts of such funding against a list of criteria.
No additional information is provided at this stage from the project.
In the spec, the requirements are numerated from 1 to 5, as the scenario was not given
at that time. The details of each requirement are provided in the following:
Each production company may have received one or multiple grants, and grants
can be shared by more than one company.
It is possible for each employee to have more than one telephone number. Each
telephone number has a description associated with it (e.g., personal, or work).
Each production company is registered only once but can have many shareholders.
Each employee can either be a member of the crew OR a staff member. Each crew
member can be an actor OR a director OR have another role. Each staff member
belongs to a department. No duplication of data is allowed.
Each crew member may be part of one or more movies in a single role or many.
Based on that I have created THIS DIAGRAM.
I think I have all the entities,attributes and relationships down but I'm missing the keys. Keys can't be names right? I will use the company entity as an example. So, should I create new attributes like company_id to use as primary keys or just underline the name attributes and use it as Primary Key?
Also, please tell me if there's anything else wrong with the diagram.
Thanks a lot!
I created an er diagram but some entities don't attributes that can be used as primary keys because they are names. I tried using them but I don't think it's right.
The problem with names as primary keys
In your diagram, you have a couple of name used to identify entities: Grant, Production Company, Shareholder (full name), Employee, Movie (Title). You can in theory use them as primary key. However, this is a bad practice:
names can change (e.g. departments and companies can be renamed, movies can have a temporary working title);
names are often not sufficient to distinguish entities (e.g. there may be different people having the same name, e.g. Adam Smith);
names can be spelled differently across source of information , and are also easily misspelled;
although not really noticeable with modern RDBMS, names are more time consuming to search, and consume more memory when used as foreign keys.
How to chose a primary key?
You'd better use a primary key that guarantees uniqueness. You can then decide easily if a same name correspond to a different entity or not.
The next question that you'll then face in you design is surrogate key vs. natural key:
When there's no other unique information, you'll have not choice than using a surrogate key.
When there are other potential unique attributes, you may chose to use either a natural key (e.g. company registration number, national insurance number together with a country code, movie code?) or a surrogate one.
Keep in mind that both have advantages and inconveniences, but the surrogate key is in general more robust, as natural keys sometimes appear to be not as stable as expected.
Other remarks concerns about your ERD
By the way, here some issues and other remarks:
Works in relation does not relate Staff to anything else. From the name, it's obviously not a reflexive relation either. So this is a diagram error. department (name) and building should either be attached to a Department entity or be attributes of Staff.
In several cases you relate attributes to other attributes (actor-extra role, phone number-description) . This is also a diagramming error. Either add the extra attribute to the same entity, or there's a missing relationship with a missing entity.
In one case you relate two entities without a relation between the two (production company- application). This is an inconsistency that must be corrected also.
The following attributes are not real attributes but probably values of an unidentified attribute: producer, composer, actor, editor, xyzzy designer, advertising, HR, janitor.
Government authority is a misleading entity name: nowhere do you refer to data about the authority itself (name of the authority, e.g. "CNC", country of the government, ...). It's only information about the company's registration.
In your diagram you leave the hourly and monthly wage at the level of the Employee. This does not model accurately the requirements.
The link of the relation receive funding and the entity Application with the same attribute outcome seems very ambiguous.
In the name of the entities, stay consistent: either singular or plural. But mixing both will lead to lots of typos.
Better show cardinality in the link between the relation and the entity, than on the top of the relation: this avoids confusion about the direction of reading.
As a side remark, your question provides wealth of interesting details, but that are not really needed for answering the core of your question. Better limit yourself to only the information directly related to your issue in your next questions ;-)
Research or not research, keep in mind that GDPR may apply and that it requires inter alia privacy by design (some information about the shareholder and the employee may require some additional thoughts).

How to prevent a clutter of DTO objects?

I am using the DTO pattern,
I am using an Auto Mapper library to even help map domain objects to DTO objects and it works well.
now that my application gets bigger i find myself in need for many different DTO object supporting different data needs.
lets say for example my application displays a list of employees each employee has an age property and a salary property in my application.
in my UI, on one page i show just a list of all the employees, in a different one i show a list of departments names the employees are in and the number of employees in each one, on a different page i show statistics on each avg salary, avg age total employee salary in the department etc...
considering i have many departments and many employees (lets say millions of employees and thousands of departments, too many for the client to calculate statistics on its own)
my question is how would you build an API that serves the client without creating many many DTOs? and without making unnecessary calculations?
for example: in one flow counting the amount of employees in a department without calculating the avg salary in a case the avg salary is not interesting in order to make the api respond quicker and in another case calculating them both.
are there any other patterns to make this more efficient?
GraphQL is one potential option. It's a tool developed by Facebook for retrieving specific parts of their domain without retrieving the whole object.
This works well for retrieving pre-existing domain data. It doesn't work as well for calculating data (you mention count of employees and avg salary). As long as these are defined attributes (of department) you're fine.

using one to many over many to many sql

I'm designing a database model where there are agents that can have customers.
From a best practice standpoint, I'd like to know what is best relationship to use.
The thing is, a customer could be working with multiple agents. If we want to consider that the customer should be treated as if they are being worked from a different angle, is it best practice to design the model as a one to many relationship instead of a many to many?
In otherwords, if Agent A and Agent B are working with John Doe, should we treate John Doe as separate entities for each agent, even though the record of John Doe may be the same (think like contact details).
It sounds like you simply want a junction/association table with columns such as:
CustomerId
AgentId
You may also want dates, descriptions and other information describing the relationship being worked on.
You mean to have agent_customer table? Which has the auto increment ID as PK, agentID and customerID so one agent can have multiple customers and a customer can have multiple agents. Make sure to make agentID and customerID unique to avoid redundancy.

First name disambiguation in SQL

So I've been handed a project that I'm trying to find a premise for. Essentially I am going to be taking customer information from a number of transactional databases. Then merging them into one dimension table with various interesting information from all the records. Some of these people may be in many of the databases or multiple times in the same database or both.
Since the name comes from user input one entry may say Sally Jones, one may say Susan Jones, one may say S Jones and it all still be the same person. The way I'm THINKING of going about this is finding disambiguations of as many names as I can and putting it into a bridge table so when I pull new info from a transaction database I can run it through the bridge table and match it do any of the names that are listed
Has anyone done or heard of something similar? Or know of a table/list that can can import into excel/sql that will give me a starting point for first name disambiguations?
Basically you need a Clients table and a a way to associate this table with the Transactions table. If the clients table doesn't exists in the DB, I advise you to create it to make your task possible. It's extremely important to have a single identification for each client and connect clients with transactions to know exactly all the transactions per client.

OLAP Dimension structure

I have Dimension "Customer". Each Customer can have some buisness units and some departmens.
I should bild 2 hierarchies: Customer->Department and Customer->Buisness Unit.
So, I also need to set key attribute. This is my question: What should be use as key attribute?
May be I do this wrong?
Could you help?
To define hierarchies, you should ask the following questions:
If I group the departments, I have a consumer? If I group the business, I have a consumer?
If I group the departments and business, I have a consumer?
If grouping the departments get a consumer, so the hierarchy is: Consumer > Department. Similarly with the other.
If grouping the department and business (an attribute in dimension that contains two pieces of information, for example, DPT1-BUS1) obtains a consumer, the hierarchy is: Consumer > Department_Business.
It is not recommended to have null attributes in a dimension. So make sure that the consumer needs to have a business and a department. Otherwise, rephrase the modeling of the data warehouse. Generally, a key dimension is a artificial key auto-increment...
I recommend that read Kimball
Hope this help.