We have a Tabular model with several Fact tables and several Dim tables.
We would like to manages roles so that specific roles will not be able to see members of a certain attribute within a dimension.
So in an HR cube with a "Work Hour" measure - i would like to block a specific role from seeing the "Employee Name" attribute but still show the sum of "Work Hours" to the total employee.
While using multidimensional, i simply used an MDX expression which filters on the "All" member of the dimension thus showing the total but not the members of an attribute.
Don't know how to do so in Tabular Model.
Did someone encounter a similar request?
Thank you!
Yes, Tabular models don't give you the option of disabling "visual totals". So this isn't easy to do. However if you get creative you can do it. If you remember that calculated columns are calculated at processing time without security then you can store the rollups you need ahead of time. Store those rollups somewhere users can read them from even with security in effect. In this case you may need to put the rollups in a separate table, separate from the employees since all rows in that table will be hidden. Here is a full write up:
http://cathydumas.com/2012/05/19/row-security-and-hierarchiespart-1/
However in your case since you want to hide all Employee table rows that will cause all related fact table rows to disappear due to security. So here is what I would suggest. First, disable the relationship to the Employee table. Second, pattern your measures after this pattern:
Work Hours := IF(
COUNTROWS(Employee)>0,
CALCULATE(
SUM(FactHours[Work Hours]),
USERELATIONSHIP(FactHours[EmployeeKey], Employee[EmployeeKey])
),
SUM(FactHours[Work Hours])
)
The logic here is that if your user can't see any employees then don't enable that relationship. If your user can see employees, then enable the relationship.
Related
I'm taking a DBMS course and I need to design and build my own DB. I have a database for a hospital where doctors,nurses,support staff etc are in a ISA relationship to an Employee entity with the rest of the data like the name, address , salary and the rest of the employee data.
Designing a form, I want to be able to add an employee with all of their data in one form.
Is there a way to do a "conditional table" of sorts where if i select "doctor" from a drop-down i get to add to the doctor table too, and same for the rest of the entities under the ISA relationship?
Thx!
As a general rule, when dealing with data, you do NOT flip or switch tables for a given form or relatonal database design.
So, for example. If I have a table of customers. Well, now if I want to mark some of the customers as plumbers, and others as doctors? I don't create two tables.
All I would do is add ONE column to that customers table and it would simply allow me to set the type of customer. The reason for this design is "many" but some significant reasons are:
For each new type of customer, you would not create a new table. Worse, all of the forms, the reports, the SQL, the code you write? Well, all of that code would have to be modified EACH time you create a new table. So, you SIMPLY cannot adopt a design in which the concept of changing a table is part of that process.
Forms are bound to ONE table. For related data, you in most cases will use a sub form.
So, think of even a accounting system. They can have huge numbers of customers, and as a result, you can "query" that table to give you all customers. Or you might ask how many accounting firms are in the customer list. Or make a report that summeries by customer type a "count" of each type of customer.
So, buidling forms, or reports? They cannot on the fly "change" the tables they are using.
So, in place of a tables called:
SalesJan
SalesFeb
SalesMar
etc.
Well, now you can't query sales from Jan to mar, because the data is in different tables.
So, what you do is have ONE table called "sales", and you add ONE column of the date. Now, at the start of each new month, you don't have to create a new table.
Now, of course in some cases it makes sense to create a separate table. For example, a table of customers, and a table of employees in a database is just fine. It makes sense in this case to use two tables, since the information about a customer and what they can do and the kind of information is VERY different then how you would deal with employees.
So, with above? Well, if I need to print mailing labels for all customers and all employees? That would require two different reports. And very likely the table structure for the two tables is different.
Bottom line:
If you working on design or form or report? And you needing to try and change the table that the form/report/code etc is going to operate on? This is a sign that your design approach has gone complete off the rails and is the wrong design.
So, in the case of doctors, nurses etc.? Well, they are all hospital staff, and MOST of the basic information about such employees will be common, much the same, and thus a SINGLE table of "employees" makes the most sense. You would only need a nice "employee type" combo box on that one form, and thus you can add/enter/edit/search any employee in that one table.
The fact that you "want to search" for a employee show that all these people "are" employees and thus belong in one table. And the basic information about all employees is going to be the same anyway. If you find you are attempting to create a new table but with near identical structures over and over, then just like a new table for each month sales, or a new table for each new kind of employee? Simply add the "one" column that allows you to make that distinguish, and not a whole new table.
Now one COULD even attempt to put patients in the same table, but then again, dealing with patents as opposed employees is a considerable different kind of "thing".
So employees are employees - even different kinds. (manager, cleaning staff etc.).
And patients are patients - even different kinds (long term care, emergency etc.).
To keep this as short as possible I'm going to use and example.
So let's say I have a simple database that has the following tables:
company - ( "idcompany", "name", "createdOn" )
user - ( "iduser", "idcompany", "name", "dob", "createdOn" )
event - ( "idevent", "idcompany", "name", "description", "date", "createdOn" )
Many users can be linked to a single company as well as multiple events and many events can be linked to a single company. All companies, users and events have columns as show above in common. However, what if I wanted to give my customers the ability to add custom fields to both their users and their events for any unique extra information they wish to store. These extra fields would be on a company wide basis, not on a per record basis ( so a company adding a custom field to their users would add it to all of their users not just one specific user ). The custom fields also need to be sesrchable and have the ability to be reported on, ideally automatically with some sort of report wizard. Considering the database is expected to have lots of traffic as well as lots of custom fields, what is the best solution for this?
My current research and findings in possible solutions:
To have generic placeholder columns such as "custom1", "custom2" etc.
** This is not viable as there will eventually be too many custom columns and there will be too many NULL values stored in the database
To have 3x tables per current table. eg: user, user-custom-field, user-custom-field-value. The user table being the same. The user-custom-field table containing the information about the new field such as name, data type etc. And the user-custom-field-value table containing the value for the custom field
** This one is more of a contender if it were not for its complexity and table size implications. I think it will be impossible to avoid a user-custom-field table if I want to automatically report on these fields as I will have to store the information on how to report on these fields here. However, In order to pull almost any data you would have to do a million joins on the user-custom-field-value table as well as the fact that your now storing column data as rows which in a database expected to have a lot of traffic as well as a lot of custom fields would soon cause a problem.
Create a new user and event table for each new company that is added to the system removing the company id from within those tables and instead using it in the table name ( eg user56, 56 being the company id ). Then allowing the user to trigger DB commands that add the new custom columns to the tables giving them the power to decide if it has a default value or auto increments etc.
** Everytime I have seen this solution it has always instantly been shut down by people saying it would be unmanageable as you would eventually get thousands of tables. However nobody really explains what they mean by unmanageable. Firstly as far as my understanding goes, more tables is actually more efficient and produces faster search times as the tables are much smaller. Secondly, yes I understand that making any common table changes would be difficult but all you would have to do is run a script that changes all your tables for each company. Finally I actually see benefits using this method as it would seperate company data making it impossible for one to accidentally access another's data via a potential bug, plus it would potentially give the ability to back up and restore company data individually. If someone could elaborate on why this is perceived as a bad idea It would be appreciated.
Convert fully or partially to a NoSQL database.
** Honestly I have no experience with schemaless databases and don't really know how dynamic user defined fields on a per record basis would work ( although I know it's possible ). If someone could explain the implications of the switch or differences in queries and potential benefits that would be appreciated.
Create a JSON column in each table that requires extra fields. Then add the extra fields into that JSON object.
** The issue I have with this solution is that it is nearly impossible to filter data via the custom columns. You would not be able to report on these columns and until you have received and processed them you don't really know what is in them.
Finally if anyone has a solution not mentioned above or any thoughts or disagreements on any of my notes please tell me as this is all I have been able to find or figure out for myself.
A typical solution is to have a JSON (or XML) column that contains the user-defined fields. This would be an additional column in each table.
This is the most flexible. It allows:
New fields to be created at any time.
No modification to the existing table to do so.
Supports any reasonable type of field, including types not readily available in SQL (i.e. array).
On the downside,
There is no validation of the fields.
Some databases support JSON but do not support indexes on them.
JSON is not "known" to the database for things like foreign key constraints and table definitions.
I have a fact table called "FactActivity" and a few dimension tables like users, clients, actions, date and tenants.
I create measure groups corresponding to each of them as follows
FactActivity => Sum of ActivityCount colums
DimUser => Count of rows
DimTenant => Count of rows
DimDate => Count of rows and distinct count of weekofyear column
Each user can do multiple actions using multiple clients. A tenant is logical grouping of users. So a tenant contain multiple users but a user can't belong to more than 1 tenant. All the dimension tables and fact tables are connected to DimDate via regular relationship.
The cube structure is as follows.
Now I want to defined the dimension relationships to each of the measure group. Some of them are Many-Many relationsip (to enable distinct count calculation). The designer is showing me multiple options to choose from for many of the intersections. I'm confused as to which one to select as intermediate measure group. Should I always pick the measure group whose total # rows is the least ex: DimDate? Or what is the right logic to determine the intermediate measure group.
This is what I got. IS this right? If no, what is wrong?
For more information to hep choose the right answer.
FactActivity = 1 billion rows
DimUser = 35 million rows
DimTenant = 1 million rows
DimDate = 1000 rows
The correct way to choose the intermediate measure group depends on how you want to evaluate your measures with respect to the dimension related:
Let's start with Activity measure group to Tenant dimension: The question is: How should Analysis Services determine the activity count (or any other measure in the Activity measure group) of a tenant? The only reasonable way to determine this would be to go from the activity fact table through the user table to the tenant table. And actually, the last relationship is not a many-to-many relationship, but a many-to one relationship. I. e. you could optimize away the tenant dimension by integrating it into the user dimension. However, using a many-to-many relationship will work as well, just be a little less efficient. You might also consider using a reference relationship from user to tenant instead of a many-to-many relationship. And there may be other considerations why you may have chosen to have them two separate dimensions, thus I do not discuss this any further.
Now let us continue with the next one: Tenant measure group to User dimension: The way you have configured it (using the date measure group) means that for each date that a tenant and a user have in common, the tenant count of a user adds one to the count. This is probably not what you want. I would assume you want to relate tenant measures to user dimension by the user measure group. However, I am not sure what the purpose of the DateKey in the user and tenant dimension tables is at all. Thus, your relationship may be correct.
Let's continue with the relationships from the Date measure group to the Tenant and User dimensions. I would assume there should be no relationship at all, as the week of the year and the date count do not depend on tenants or users. Please note that it is absolutely ok to have no relationship between some measure groups and some dimensions. If you look at the Microsoft sample cube "Adventure Works", it has more gray cells (i. e. measure group and dimension being unrelated) in the Dimension Usage than white ones (i. e. there is some kind of relationship between measure group and dimension, of whichever type). In the default setting of IgnoreUnrelatedDimensions = true of a measure group, this means that the measure value will be the same for all members of the dimension. This should be the case for date count and week of year. However, again, as I do not know the purpose of the DateKey in the user and tenant dimension tables, I am not sure if this assumption is correct for your data.
And after these examples, I would hope you can continue with the rest of relationships yourself.
I'm quite new to MDX and I'm having a bit of an issue with the aggregation of one of my measures.
In my DSV I have an "Events" table. We track the agents that run these events, and since multiple agents can be involved in running a single event, I have split this out into a separate table of "Agent" with a bridging table in the middle:
http://imgur.com/uAy3moC
I want to track what is called "Coverage", which is the number of Events held in a particular week and also each agent who ran the event. So if there were 3 events held one week, and one of these events were run by two agents, that would be a coverage of 4.
When I go to analyse the cube, dragging on Week Commencing and my count of Events, I note that it isn't right - It only considers individual events and not the number of agents. Dragging on Agents solves this but I still want to see an overall figure without having to drag on Agents.
So I created a calculated member like so:
CREATE MEMBER CURRENTCUBE.[Measures].[Visit Coverage]
AS
IIF([Agents].[Agent].currentmember.parent IS NULL,
SUM([Agents].[Agent].[Agent], [Measures].[Events Count]),
[Measures].[Events Count]);
So basically, if all agents are selected (parent is null), sum up all of the events count for each agent, otherwise just give me the events for each individual agent if I'm analysing by agent. This works great...and also works if I want to filter by one particular agent, but then falls over if I try to filter by more than one (but less than all) agents, giving me a null value.
I'm completely stumped on how to solve this one, could anyone help me out?
Chris
Your table design covers the requirement to count the number of events, avoiding a double-counting of agents via the many-to-many bridge table between the main fact table and the agent dimension table. If you want to have a measure that does not avoid double-counting, then its fact table should directly link the dimension tables.
Hence, I would create a view or named query that has foreign keys to your three dimension tables, and use this as the base of a new measure group in addition to the existing ones. This view or named query could just be built as a join from the main fact table and the bridge table. Then add a count as the coverage measure to this measure group.
We use a third-party product to manage our sports centre membership. We have several membership types (eg. junior, student, staff, community) and several membership statuses (eg. annual, active, inactive, suspended). Unfortunately the product only records a member's current membership type and status. I'd like to be able to track the way our members' type and status have changed over time.
At present, we have access to the product's database design. It runs on SQL Server and we regularly run our own SQL queries against the product's tables to produce our own tables. We then link our tables to pivot-tables in Excel to produce charts. So we're familiar with database design and SQL. However we're stuck as to how to best approach this problem.
The product records a member's membership purchases and their start and expiry dates. So we can work back through that data to determine a member's type and status at any point in time. For example, if they bought a junior membership on Jan 1, 2007 and it expired on Dec 31, 2007 and then they bought a student membership on Jun 1, 2008, we can see their status went from active to inactive to active (on Jan 1, 2008 and Jun 1, 2008, respectively) and their type went from junior to student (on Jun 1, 2008).
Essentially we'd like to turn a member's type and status properties into temporal properties or effectivities a-la Fowler (or some other thing that varies with time).
Our question (finally :) - given the above: what database table design would you recommend we use to hold this member information. I imagine it would have a column for MemberID so we can key into the existing Member table. It would also need to store a member's status and type and the date range they were held for. We'd like to be able to easily write queries against this table(s) to determine how many members of each type and status we had at a given point in time.
UPDATE 2009-08-25: Have been side-tracked and haven't had a chance to try out the proposed solutions yet. Hope to do so soon and will select an answer based on the results.
Given that your system is already written and in place, the simplest approach to this problem (and the one that affects the existing database/code the least), is to add a membership history table that contains MemberID, status, type and date columns. Then add an UPDATE and an INSERT trigger to the main member table. When these triggers fire, you write the new values for the member (along with the date of the status change) into the member history table. You can then just query this table to get the histories for each member.
This is fairly simple to implement, and won't affect the existing system at all.
I'll write this for you for a free membership. :)
I cannot recommend you enough to read Joe Celko's "Sql for smarties - advanced sql programming". he has a whole chapter on temporal database design AND how to (effeciently and effectively) run Temporal Projection, Selection and Temporal Join queries. And I would not do him justice to even attempt to explain what he says in his chapter in this post.
I would create a reporting database that was organized into a star schema. The membership dimension would be arranged temporally, so that there would be different rows for the same member at different points in time. That way different rows in the fact table could pertain to different points in history.
Then I would create update procedures for updating the reporting database periodically, say one a week, from the main database. This is where the main work would come.
Then, I would drive the reports off the reporting database. It's pretty easy to make a star schema do the same things a pivot table does. If necessary, I'd get some kind of OLAP tool to sit in front of the reporting database.
This is a lot of work, but it would pay off over time.
I would put the membership info in it's own table with start and end dates. Keeping the customer in separate table. This is a pain if you need the "current" membership info all the time but there are many ways to get around that either through queries or triggers.