I was just wondering if there could be any fact table, the keys of which don't belong to any of the dim tables? However, the fact table seems to contain the dim data.
The reason I came up with this question is that I was looking into a package which uses a dim table and fact table to pull data from, manipulate and them dump into the fact table. But, when I was trying to find any dependencies on the fact table (in the DSV ADD/Remove tables dialog box, I added the fact table, and then when I clicked on related tables, there was none)
And my claim is that the fact table gets some of its data from the dim tables.
Correct me if I am wrong.
Does your Fact table have columns which contain dimension keys, but are not constrained with a foreign key? I assume SSAS uses the foreign keys to identify related tables, so in this case, it wouldn't detect those tables. You can add related tables manually.
Another possibility is that the Fact table contains all the dimensions internally in a denormalized form. Rather than having dimension tables and keys to dimension members, all the data is stored in string form in the Fact table. If this is the case, you can create dimensions from the columns in the Fact table.
Is the fact table a table or named query?
Related
I need to create a bus matrix and in order to do that i need to know which fact table has relationships with which dimension tables.
Unfortunately, in this new project I'm in, it seems to be no FK (crazy, i know).
What I thought about is to use ETL queries and check the joins between the Fact table with the dimension tables.
What I'm worried about is that there might be more relationships that are not included in ETL queries...any advice?
You can use the system metadata tables to list the foreign key references:
select tbname, pkcolnames, reftbname, fkcolnames, colcount
from SYSIBM.SYSRELS B;
If the database does not have properly declared foreign key relationships, then the database does not have the information you are looking for.
Assuming the DB holds no information about the FKs (or information that would help you derive them, like identical column names) then, as you mentioned, examining the ETL code used to load each fact table is probably the only other way of doing this. The ETL must be running a look up on each dimension to get the PK to insert into the fact record, so the information will be there.
There shouldn't be any relationships involving facts that you couldn't determine with this approach. There may be additional relationships between dimensions (bridge tables, more complex SCD types, etc.) but if you sorted out the fact relationships then what remains should be a small enough subset to resolve manually (i.e. by intelligent guesswork)
I have different tables in my scheme with different columns, but I want to store data of when was the table modified or when was the data stored, so I added some columns to specify that.
I realized that I had to add the same "modification_date" and "modification_time" columns to all my tables, so I thought about making a new table called DATA_INFO so I won't need to do so, but every table has a different PRIMARY KEY and I don't know which one to add as FOREIGN KEY to the DATA_INFO table.
I don't know if I have to maybe add all of them or is there another way to do what I need.
It's better to have the same "modification_datetime" column in all tables, rather than trying to keep that data in a central table.
That's what we have done at every shop I've worked in.
I want to emphasize that a separate table is not reasonable for this purpose. The lack of an obvious foreign key is a hint.
Unlike Tab Allerman, tables that I create are much less likely to be updated, so I have three additional columns on most tables:
CreatedBy -- the user who created the row
CreatedAt -- when the row was creatd
CreatedOn -- the system where the table was created
The most important point is that this information can -- in many databases -- be implemented using default values rather than triggers. That is a big advantage of working within a single row. The fewer triggers, the better.
I have a problem with finding a way to represent multiple tables hash tables into a single table.
Say I have 3 tables with the format:
Table1(Table1_PK1,Table1_PK2,Table1_PK3,Table1_Hash)
Table2(Table2_PK1,Table2_PK2,Table2_Hash)
Table3(Table3_Pk1,Table3_PK2,Table3_PK3,Table3_PK4,Table3_PK5,Table3_Hash)
Table1_PK1,Table1_PK2,Table1_PK3... are columns and they might have different datatypes (VARCHAR, INT or DATETIME ...).
My question is if there is a way to create a single table (fixed number of columns) that can represent all of these 3 tables (may be more in practical).
I am trying to do this for my database tool. Each table actual a table which contains primary keys and a hash data associating with them.
Since you're apparently building a database tool, not a database, it might make more sense to do this in application code rather than in a database table.
In a different answer, you commented
I am still looking for a dynamic way to do it without knowing how many primary keys a table can have.
A table can have only one primary key. That primary key can consist of more than one column, though. (You already knew this; you were just using the wrong words, which might confuse others.)
A table can also have an arbitrary number of other keys, which will be either declared (as NOT NULL UNIQUE) or "undeclared" (by creating an index that guarantees uniqueness over a set of columns).
You can look all that stuff up at run time in one or both of two ways. (Links go to documentation for PostgreSQL.)
System tables, sometimes called system catalogs
information_schema views
As far as I know, all modern SQL platforms implement at least one of these interfaces. The information_schema views are covered in the SQL standards, but there seems to be some room for interpretation. They don't look quite the same on all platforms.
Why combine the 3 tables into one? Would be really bad db design. But here's a way to do it:
The one table will have a column for each of the 3 tables' columns you want in the final table. I am making the assumption that TableX_Hash is the same type, so that remains as one unique column:
Table_All_in_One (
Table1_PK1,
Table1_PK2,
Table1_PK3,
# space just for clarity of grouping
Table2_PK1,
Table2_PK2,
Table3_PK1,
Table3_PK2,
Table3_PK3,
Table3_PK4,
Table3_PK5,
TableX_Hash # Assuming all the _Hash'es are the same type+length,
# otherwise, add Table1_Hash, Table2_Hash, Table3_Hash
# This can be your new primary key
)
The Primary Keys (PKx) are required to be non-NULL only in their own tables. For this table, they have to allow nulls. The idea is that each row of this new table will only hold the data for one of the tables. The other columns will be empty for that row. If you want to associate the row of one table with another, you can add that to the same row or add FK_Table1_Hash, FK_Table2_Hash and FK_Table3_Hash columns which will refer to the TableX_Hash value of a record.
PS: I wonder if what you are really looking for is a View and not this really bad all-in-one table.
Edit: Combining them into one "without knowing how many primary keys a table can have." as per your comment:
Store all the _PKs concatenated into one column:
Table_All_in_One (
New_PK,
TableX_Hash,
Table1_PKx, # Concatenated PKs of Table1
Table2_PKx, # Concatenated PKs of Table2, etc.
...,
# OR just one
TableX_PKs, # concatenate all the PK's into one VARCHAR field
# Add a pipe `|` between them optionally.
Table_Num # If using just one, then you'll need to store the table number
)
You will not be able to conveniently pick records based on part of their composite primary key. It will always have to be TableX_PKs = CONCAT_WS('|', Table1_PK1, Table1_PK2, ...). So your only dependency is the number of PKs in the original column.
In order to model a bunch of tables you will need 3 tables. An entity table that contains the table names of the tables you wish to set up this way called a factor or entity table. A Factor_detail table that contains all the columns and their associated properties of the tables. A table, factor_detail_value, for storing things like lookup values for lookup tables. I'm trying to learn more about this myself as well because we are using this technique at work as well. Genrate sql on the fly for any table so encoded, and store the data in a repository pertiinant to the data itself. This way if a table changes and you need to add a column or change a datatype, you can add a row to the factor detail table without affecting a database shut down in production. In most businesses a four hour shut down to make a sql data table change can cost thousands of dollars. If you are dealing with insurance for example, each additional state that you sell insurance in has different requirements for being able to seel it and that will result in table changes. We reduced our table count way down from over 700 tables in this manner also we can make changes without database shut down thus avoiding the loss in revenue.
When I create a view I can base it on multiple columns from different tables.
When I want to create a lookup table I need information from one table, for example the foreign key of an order table, to get customer details from another table. I can create a view having parameters to make sure it will get all data that I need. I could also - from what I have been reading - make a lookup table. What is the difference in this case and when should I choose for a lookup table?? I hope this ain't a bad question, I'm not very into db's yet ;).
Creating a view gives you a "live" representation of the data as it is at the time of querying. This comes at the cost of higher load on the server, because it has to determine the values for every query.
This can be expensive, depending on table sizes, database implementations and the complexity of the view definition.
A lookup table on the other hand is usually filled "manually", i. e. not every query against it will cause an expensive operation to fetch values from multiple tables. Instead your program has to take care of updating the lookup table should the underlying data change.
Usually lookup tables lend themselves to things that change seldomly, but are read often. Views on the other hand - while more expensive to execute - are more current.
I think your usage of "Lookup Table" is slightly awry. In normal parlance a lookup table is a code or reference data table. It might consist of a CODE and a DESCRIPTION or a code expansion. The purpose of such tables is to provide a lsit of permitted values for restricted columns, things like CUSTOMER_TYPE or PRIORITY_CODE. This category of table is often referred to as "standing data" because it changes very rarely if at all. The value of defining this data in Lookup tables is that they can be used in foreign keys and to populate Dropdowns and Lists Of Values.
What you are describing is a slightly different scenario:
I need information from one table, for
example the foreign key of an order
table, to get customer details from
another table
Both these tables are application data tables. Customer and Order records are dynamic. Now it is obviously valid to retrieve additional data from the Customer table to display along side the Order data, and in that sense Customer is a "lookup table". More pertinently it is the parent table of Order, because it has the primary key referenced by the foreign key on Order.
By all means build a view to capture the joining logic between Order and Customer. Such views can be quite helpful when building an application that uses the same joined tables in several places.
Here's an example of a lookup table. We have a system that tracks Jurors, one of the tables is JurorStatus. This table contains all the valid StatusCodes for Jurors:
Code: Value
WS : Will Serve
PP : Postponed
EM : Excuse Military
IF : Ineligible Felon
This is a lookup table for the valid codes.
A view is like a query.
Read this tutorial and you may find helpful info when a lookup table is needed:
SQL: Creating a Lookup Table
Just learn to write sql queries to get exactly what you need. No need to create a view! Views are not good to use in many instances, especially if you start to base them on other views, when they will kill performance. Do not use views just as a shorthand for query writing.
I have a table that contains a few columns and then 2 final (nullable) columns which are varbinary (actually, they are SQL 2008 geography types, but I want to keep this post database agnostic).
I've hit around 500mb with around 200K rows. The varbinary is the problem - and I need the data.
So, I was wondering if it's bad if I do the following:-
Create a separate FILEGROUP: SpatialData.mdf
Create a new table, assigned to that new filegroup.
Move all the spatial data (read: last two fields) out of the original table and into the new table. The new table has a foreign key against the original table.
Create a view representing both tables.
Now, the view will be a left outer join because the relationship is: the new table has a zero or one row relationship to the original table.
EG.
Original Table
FooId INT PK NOT NULL IDENTITY
Blah VARCHAR(..) NOT NULL
Boo WHATEVER NOT NULL
New Table
FooID PK FK NOT NULL
Spatial_A VARBINARY(MAX)/GEOGRAPHY
Spatial_B VARBINARY(MAX)/GEOGRAPHY
The reason why I want to know if this is bad is because of the view and how the view is doing a join on the spatial table. I'll be using the view a lot. Currently, I'm just doing queries against the original table (because the new table doesn't exist just yet). By adding this join and the PK/FK relationship, will this impact performance?
Why split the data? I need to download the live DB to our dev servers now and then. We don't really care too much about those two spatial fields, so not having them is fine. Therefor, the size of the database to download will be much smaller.
Thoughts?
Instead of creating a second table, joining, and creating a view, a better solution that is possible with SQL Server 2005/2008 is to use table partitioning. To my recollection, you can vertically partition a table, and place some columns (i.e. your geospatial columns) in one file group, while putting the rest in another file group. SQL Server will handle the rest for you, you don't need to bother with a join, and you don't need a view.
The method that you've described is actually fairly common in my experience. Technically, if you were to normalize your database to the fullest extent you would have a lot of tables like that since one of the (usually not used) steps in normalization includes making sure that no columns have NULL values.
In practice it isn't usually carried out to that extent, but for a column (or columns) that is sparsely populated it's not a bad idea to separate it out. As long as the tables share the same primary key (which will of course be indexed), performance shouldn't be a problem.