Can I call fields from fact table on MDX? - mdx

Is it possible to call field from fact table on MDX? or should I declare it on schema?
I have age field on my fact table. I need it to separate my measures value into '18-20', '21-25', etc.
Maybe it would look like this.
| |all region |
| |city 1 |city2 |
| |18-20|21-25|18-20|21-25|

If you need to group by age, then your age is not used as a fact, but as an attribute. Hence, the age should be as an attribute column in a dimension. And the simplest way to implement the age groups would be to add another attribute column to that dimension table containing these groups.
Thus, technically, there should be an age column and an age group column in one of your dimension tables, probably a dimension table containing other data about people. That table should then be referenced by the fact table. As you did not describe the structure of your data warehouse tables, I am sorry cannot be more specific.
And: this does not mean that you cannot use the age as a measure in other contexts, as - since you write that the age is contained in your fact table, I am assuming you already use it as a measure.

Related

Turn two database tables into one?

I am having a bit of trouble when modelling a relational database to an inventory managament system. For now, it only has 3 simple tables:
Product
ID | Name | Price
Receivings
ID | Date | Quantity | Product_ID (FK)
Sales
ID | Date | Quantity | Product_ID (FK)
As Receivings and Sales are identical, I was considering a different approach:
Product
ID | Name | Price
Receivings_Sales (the name doesn't matter)
ID | Date | Quantity | Type | Product_ID (FK)
The column type would identify if it was receiving or sale.
Can anyone help me choose the best option, pointing out the advantages and disadvantages of either approach?
The first one seems reasonable because I am thinking in a ORM way.
Thanks!
Personally I prefer the first option, that is, separate tables for Sales and Receiving.
The two biggest disadvantage in option number 2 or merging two tables into one are:
1) Inflexibility
2) Unnecessary filtering when use
First on inflexibility. If your requirements expanded (or you just simply overlooked it) then you will have to break up your schema or you will end up with unnormalized tables. For example let's say your sales would now include the Sales Clerk/Person that did the sales transaction so obviously it has nothing to do with 'Receiving'. And what if you do Retail or Wholesale sales how would you accommodate that in your merged tables? How about discounts or promos? Now, I am identifying the obvious here. Now, let's go to Receiving. What if we want to tie up our receiving to our Purchase Order? Obviously, purchase order details like P.O. Number, P.O. Date, Supplier Name etc would not be under Sales but obviously related more to Receiving.
Secondly, on unnecessary filtering when use. If you have merged tables and you want only to use the Sales (or Receving) portion of the table then you have to filter out the Receiving portion either by your back-end or your front-end program. Whereas if it a separate table you have just to deal with one table at a time.
Additionally, you mentioned ORM, the first option would best fit to that endeavour because obviously an object or entity for that matter should be distinct from other entity/object.
If the tables really are and always will be identical (and I have my doubts), then name the unified table something more generic, like "InventoryTransaction", and then use negative numbers for one of the transaction types: probably sales, since that would correctly mark your inventory in terms of keeping track of stock on hand.
The fact that headings are the same is irrelevant. Seeking to use a single table because headings are the same is misconceived.
-- person [source] loves person [target]
LOVES(source,target)
-- person [source] hates person [target]
HATES(source,target)
Every base table has a corresponding predicate aka fill-in-the-[named-]blanks statement describing the application situation. A base table holds the rows that make a true statement.
Every query expression combines base table names via JOIN, UNION, SELECT, EXCEPT, WHERE condition, etc and has a corresponding predicate that combines base table predicates via (respectively) AND, OR, EXISTS, AND NOT, AND condition, etc. A query result holds the rows that make a true statement.
Such a set of predicate-satisfying rows is a relation. There is no other reason to put rows in a table.
(The other answers here address, as they must, proposals for and consequences of the predicate that your one table could have. But if you didn't propose the table because of its predicate, why did you propose it at all? The answer is, since not for the predicate, for no good reason.)

SSAS Aggregation on Distinct ID

I wish to change the default aggregation from SUM to SUM on Distinct ID Values.
This is the current behaviour
ID Amount
1 $10
1 $10
2 $20
3 $30
3 $30
Sum Total = $90
By default, I am getting a sum of $90. I wish to do the sum on distinct ids and get a value of $60. How would I modify the default Aggregation Behavior to achieve this result?
Design your data as a many-to-many relationship: create one table/view having one record per ID and the amount column from the data shown in your question (the main fact table), and one table/view having one record per record of your data as shown in your question, presumably having another column, as otherwise it would not make any sense to have the data as shown in your question). This will be the m2m dimension table. Then, create a bridge table/view having the id of the m2m dimension table and your ID column.
Then create the following AS objects: A measure group from the main fact table, a dimension on column ID of the same table (in case there is no other column making a dimension table meaningful, in that case, you would better have a separate dimension table having ID as the primary key). Create a dimension from the m2m dimension table, and a measure group having only the invisible measure "count" from the bridge table. Finally, on the "Dimension Usage" tab of Cube Designer, set the relationship between the m2m dimension and the main measure group to be many to many via the bridge measure group.
See http://technet.microsoft.com/en-us/library/ms170463.aspx for a tutorial on many-to-many relationships.

Basic question: how to properly redesign this schema

I am hopping on a project that sits on top of a Sql Server 2008 DB with what seems like an inefficient schema to me. However, I'm not an expert at anything SQL, so I am seeking for guidance.
In general, the schema has tables like this:
ID | A | B
ID is a unique identifier
A contains text, such as animal names. There's very little variety; maybe 3-4 different values in thousands of rows. This could vary with time, but still a small set.
B is one of two options, but stored as text. The set is finite.
My questions are as follows:
Should I create another table for names contained in A, with an ID and a value, and set the ID as the primary key? Or should I just put an index on that column in my table? Right now, to get a list of A's, it does "select distinct(a) from table" which seems inefficient to me.
The table has a multitude of columns for properties of A. It could be like: Color, Age, Weight, etc. I would think that this is better suited in a separate table with: ID, AnimalID, Property, Value. Each property is unique to the animal, so I'm not sure how this schema could enforce this (the current schema implies this as it's a column, so you can only have one value for each property).
Right now the DB is easily readable by a human, but its size is growing fast and I feel like the design is inefficient. There currently is not index at all anywhere. As I said I'm not a pro, but will read more on the subject. The goal is to have a fast system. Thanks for your advice!
This sounds like a database that might represent a veterinary clinic.
If the table you describe represents the various patients (animals) that come to the clinic, then having properties specific to them are probably best on the primary table. But, as you say column "A" contains a species name, it might be worthwhile to link that to a secondary table to save on the redundancy of storing those names:
For example:
Patients
--------
ID Name SpeciesID Color DOB Weight
1 Spot 1 Black/White 2008-01-01 20
Species
-------
ID Species
1 Cocker Spaniel
If your main table should be instead grouped by customer or owner, then you may want to add an Animals table and link it:
Customers
---------
ID Name
1 John Q. Sample
Animals
-------
ID CustomerID SpeciesID Name Color DOB Weight
1 1 1 Spot Black/White 2008-01-01 20
...
As for your original column B, consider converting it to a boolean (BIT) if you only need to store two states. Barring that, consider CHAR to store a fixed number of characters.
Like most things, it depends.
By having the animal names directly in the table, it makes your reporting queries more efficient by removing the need for many joins.
Going with something like 3rd normal form (having an ID/Name table for the animals) makes you database smaller, but requires more joins for reporting.
Either way, make sure to add some indexes.

storing multiple formats in a table

So here's the basic problem: I'd like to be able to store various fields in a database. These can be short textfields (maybe 150 characters max, probably more like 50 in general) and long textfields (something that can store a whole page full of text). Ideally more types can be added later on.
These fields are group by common field_group ids, and their type shouldn't really have anything to do with categorization.
So what's the best way to represent this in MySQL? One table with a short_text and long_text columns of differing types, one of which is to be NULL? Or is there a more elegant solution?
(I'd like this to be primarily driven by ease to select all fields with a given field_group_id.)
Clarification
I'm essentially attempting to allow users to create their own tables, but without actually creating tables.
So you'd have a 'Book' field group, which would have the fields 'Name' (short text), 'Summary' (long text). Then you would be able to create entries into that book. I realize that this is essentially the whole point of MySQL, but I need to have a LOT of these and don't want users creating whole tables in my database.
What you are looking for is called an EAV. With an EAV model you can build any freaking database in the world with only inserts. But it's really horrible for a lot of reasons but yours sounds so looney-tunes, it could work.
Build an Entity table
In here you'd list
Car
Person
Plant
Build an Attribute Table.
Here you'd list the PK from Entity and the list of attributes.
I'll use the word instead of a number PK.
Car | Engine Cylinders
Car | Doors
Car | Make
Person | First Name
Person | Last Name
then in a third table you'd list the actual values for each one, again using the words but you'd have numbers.
Car | Engine Cylinders | 4
Car | Doors | 4
Car | Make | Honda
Person | First Name | Stephanie
Person | Last Name | Page
If you want to get tricky instead on one column for value you could have 4 columns
a number
a varchar
a date
a clob
then in the Attribute table you could add a column that says which column to put the data.
If you plan on this database being "Multitenent" you'll need to add an OWNER table as the parent of the entity table, so you and I could both have a Car entity.
But this SUCKS to query, SUCKS to index, SUCKS to use for anything else but a toy app.
I don't know exactly what you mean by "field group", but if the information (short text, long text) all belongs to a certain entry, you can create a single table and include all those columns.
Say you have a bunch of books with a title and a summary:
table: `books`
- id, int(11) // unique for each book
- title, varchar(255)
- writer, varchar(50)
- summary, text
- etc
Fields that don't necessarily need to be set can be set to NULL by default.
To retrieve the information, simply select all the fields:
SELECT * FROM books WHERE id = 1
Or some of the fields:
SELECT title, writer FROM books ORDER BY title ASC

redundant column

I have a database that has two tables, these tables look like this
codes
id | code | member_id
1 | 123 | 2
2 | 234 | 1
3 | 345 |
4 | 456 | 3
members
id | code_id | other info
1 | 2 | blabla
2 | 1 | blabla
3 | 4 | blabla
the basic idea is that if a code is taken then its member id field is filled in, however this is creating a circle link (members points to codes, codes points to members) is there a different way of doing this? is this actually a bad thing?
Update
To answer your questions there are three different code tables with approx 3.5 million codes each, each table is searched depending on different criteria, if the member_id column is empty then the code is unclaimed, else, the code is claimed, this is done so that when we are searching the database we do not need to include another table to tell if it it claimed.
the members table contains the claimants for every single code, so all 10.5 million members
the additional info has things like mobile, flybuys.
the mobile is how we identify the member, but each entry is considered a different member.
It's a bad thing because you can end up with anomalies. For example:
codes
id | code | member_id
1 | 123 | 2
members
id | code_id | other info
2 | 4 | blabla
See the anomaly? Code 1 references its corresponding member, but that member doesn't reference the same code in return. The problem with anomalies is you can't tell which one is the correct, intended reference and which one is a mistake.
Eliminating redundant columns reduces the chance for anomalies. This is a simple process that follows a few very well defined rules, called rules of normalization.
In your example, I would drop the codes.member_id column. I infer that a member must reference a code, but a code does not necessarily reference a member. So I would make members.code_id reference codes.id. But it could go the other way; you don't give enough information for the reader to be sure (as #OMG Ponies commented).
Yeah, this is not good because it presents opportunities for data integrity problems. You've got a one-to-one relationship, so either remove Code_id from the members table, or member_id from the codes table. (in this case it seems like it would make more sense to drop code_id from members since it sounds like you're more frequently going to be querying codes to see which are not assigned than querying members to see which have no code, but you can make that call)
You could simply drop the member_id column and use a foreign key relationship (or its absence) to signify the relationship or lack thereof. The code_id column would then be used as a foreign key to the code. Personally, I do think it's bad simply because it makes it more work to ensure that you don't have corrupt relationships in the DB -- i.e., you have to check that the two columns are synchronized between the tables -- and it doesn't really add anything in the general case. If you are running into performance problems, then you may need to denormalize, but I'd wait until it was definitely a problem (and you'd likely replicate more than just the id in that case).
It depends on what you're doing. If each member always gets exactly one unique code then just put the actual code in the member table.
If there are a set of codes and several members share a code (but each member still has just one) then remove the member_id from the codes table and only store the unique codes. Access a specific code through a member. (you can still join the code table to search on codes)
If a member can have multiple codes then remove the code_id from the member table and the member_id from the code table can create a third table that relates members to codes. Each record in the member table should be a unique record and each record in the code table should be a unique record.
What is the logic behind having the member code in the code table?
It's unnecessary since you can always just do a join if you need both pieces of information.
By having it there you create the potential for integrity issues since you need to update BOTH tables whenever an update is made.
Yes this is a bad idea. Never set up a database to have circular references if you can help it. Now any change has to be made both places and if one place is missed, you have a severe data integrity problem.
First question can each code be assigned to more than one member? Or can each member have more than one code? (this includes over time as well as at any one moment if you need historical records of who had what code when))If the answer to either is yes, then your current structure cannot work. If the answer to both is no, why do you need two tables?
If you can have mulitple codes and multiple members you need a bridging table that has memberid and code id. If you can have multiple members assigned one code, put the code id in the members table. If it is the other way it should be the memberid in the code table. Then properly set up the foreign key relationship.
#Bill Karwin correctly identifies this as a probably design flaw which will lead to anomalies.
Assuming code and member are distinct entities, I would create a thrid table...
What is the relationship between a code and member called? An oath? If this is a real life relationship, someone with domain knowledge in the business will be able to give it a name. If not look for further design flaws:
oaths
code_id | member_id
1 | 2
2 | 1
4 | 3
The data suggest that a unique constraint is required for (code_id, member_id).
Once the data is 'scrubbed', drop the columns codes.member_id and members.code_id.