how to represent metadata in a data warehouse - ssas

I am working on to create a data warehouse. I have made a database which will be the data warehouse and will consist of dimension and fact tables. I know that other than dimension and fact table a data warehouse should also consist of a meta data, now my question is what should be the structure of metadata and all the information it should have?

What about having Azure Data Catalog capture the table and column names and let developers and business users collaborate on the definitions in a UI? This would be a great data dictionary.
http://azure.microsoft.com/en-us/services/data-catalog/
Or if you want to roll your own with extended properties you might do the following:
https://www.mssqltips.com/sqlservertip/1637/script-to-build-a-sql-server-data-dictionary-and-report-with-microsoft-excel/

Related

Advice about Designing SQL Database for catalog service for a marketplace

I'm currently designing catalog service (SQL database), and I'm trying to combine SQL and NoSQL to store data like for "Data field form item table".
Catalog service:
Item_type table allows choosing different item types like simple, variable, digital...
The attribute table has no direct relationship with the item table, but they are used to fill the attribute filed in the item table and store them as JSON.
The Data field in the item table is a jsonb field, it's used to store data like variable products with their price, price relating quantity...
RelatedStoreID is the store id from store service.
Questions:
is it good to store attributes in json field especially if these attributes are going to be used for indexing?
How can I improve the design?

SQL Data modeling -Querying Records that have tags across multiple categories

I have a table that stores different software services a company offers. The services are tagged by the Industry it serves, the LoB it belongs to, and the technology involved in the service. The service can have multiple tags on each of Industry,LOB, and Technology.
For eg: Following could be the master data:
And a transaction data could look like this :
I need to create a view that can be used to query data by Industry/LoB and Technology tags. For time being I've Left outer joined all tagtoService relation tables(service-technology, service-LoB, Service-Industry tables) to the services transaction table. but this goes for a huge number of records as it is possible to typically have one service tagged to up to 10-15 industries and technologies.
Just wanted to know what is the optimal way to model this data so that I have provision to query for service by all of the three tags right from within one view.
I am not a Data modeling expert and this is more of my first venture into the data modeling side- so please pardon the 'noob'ness of my question :). I use SAP HANA as the database and expose data via an OData service for which I want to use this view as a datasource.
If you're asking modeling the data: Normally in your transaction table, you keep the foreign keys, not the text columns that can be obtained via foreign keys from the master tables. I bet that's what you meant as well but the example shows text values in the transaction tables.
Other than that, I think what you have is sound and reasonable. These "tag" tables represent different level of granularity for the "services" table and it can be counterproductive if you combine them in a single table (examples: single column with comma separated tags, XML / JSON columns, multiple columns [LOBTag1, LOBTag2, ...] ) b/c that will make these columns non-indexable and/or hard to query. You may have optimization with XML and JSON columns but those are should not be considered unless the columns are too many and sparse.

Data Warehouse Architecture Modeling

I'm trying to Architecture creating a data warehouse in the Star Schema model... any idea would be appreciated.
Any idea what I should do to create a Star Schema? Some day that I should have a linking table with DimProjects going to the fact tables. What about Project hours? What is the right approach to this or do I need other tables to link? Employee's can work on multiple projects, projects require man hours... etc.
What is the best approach on modeling?
So far I have tables:
[CODE]
Dimension Tables Measure Tables
---------------- --------------
DimEmployee FactCRM
DimProjects FactTargets
DimSalesDetails FactRevenue
DimAccounts
DimTerritories
DimDate
DimTime
[/CODE]
Dimensions in a schema of a datewarehouse means independent entities like for say
Dim_Employee
Empid(pk)
Name
Address etc likewise all other
dimensions
With each dimension keys linked to your fact like in above case
FactCRM would include only crm
related measures and would be linled
To their specific dimensions depending
upon the requirements
Without knowing the columns noone would be able to tell what you want in actual. Also remember linking a dimension to a fact is obviously a partial star schema itself so that doesnt lead to any issues. The only thing is if your dimensions are itself normalized in a schema then it becomes snowflake.
Another thing about fact related if you want to perform manipulation of othwr facts based on somw existing facts then you have to link fact table as well with a unique factid. This is called fact constellation. Then the schema would become star/snowflake schema with facy constellation

Star Schema Design / best practice

I am working with a system, which has 4 databases:
Account (Storing bank accounts, transactions, etc)
Client (Client related info)
Credit (getting rates from 3rd party system)
Quality (Further internal calculation)
I want to create 4 facts tables, one fact table for each database... for example, I will have an Account Fact table with ClientAccount, Transaction, Provider as its dimension table. I will have 3 similar Fact Tables for other databases.
My Question is: does it make sense to include each corresponding fact table in that database? i.e. Create Accounting Fact and Dimension tables in the Account database? Or is it a better to create a new database for all of our star schema, and include all the dimension and fact tables in their own database?
Without knowing too much about the system, I would suggest these are dimension tables rather than fact tables.
A dimension table represents an entity or an object that you can use to construct a fact. Accounts and clients seem like a good fit for this. I'm not sure what Credit and Quality are but they may be dimensions as well.
Your fact table should represent transaction-like records. This could be sales, transactions, phone calls, or whatever your data warehouse is reporting on. This fact table would then have foreign keys to each of the dimension tables.
Regarding a single or multiple databases: I would suggest storing it in a single database. It's easier to use that way, and you don't have to worry about database links when querying your data. Your ETL process for populating these fact and dimension tables can extract the data from these four databases and load it into one database, and from there, you can build the cubes in a single database.
Unless your data volume is very small, your data warehouse should be housed in a separate database from the transactional data. A DW has a different usage pattern (OLTP vs OLAP) and will generally have a different maintenance window.
I would recommend creating all of your Dims and Facts in a single dedicated DW database. I can't think of any benefit to separating them and it would reduce your DBA overhead by not having extra databases to manage/secure/audit/document.
As for Dimensions vs Facts, data from the OLTP Account table would be used to create a Dim and a Fact. DimAccount at the very least would be a degenerate dimension containing just the account number. You'd have to review your data to determine if any of the other records are generic attributes of the Account specifically. FactAccount would contain references to the other Dimensions (DimAccountType, DimCustomer, DimLocation, etc)
Think of the dimensions as the values from lookup tables/dropdown lists, which exist prior to any events happening. For example, a bank can offer Checking & Savings accounts, even if they do not yet have any accounts.
Facts document an event. When an account is created, the fact record will reference all of the dimensions that describe the event, and record the measurable values associated with the event, if any.

creation of dimension and fact table through source tables

I have imported my flatfiles to SQL Server 2012 and created few tables (source tables). I need to build a cube in SSAS. But I need to make "dimension" and "fact" tables it seems with proper PK/FK relations. Could someone tell me whether I need to do:
create an empty dimABC, dimXYZ tables manually with PK identified?
copy data from source tables (imported above) into this new dimXXX tables through some SQL query?
then create a new factXXX table and copy the required facts(data) from source tables above.
Then I need to use these tables during cube build process.
I appreciate your help in clarifying my steps 1,2,3.
You're pretty close on your steps. It sounds like you are new to data warehousing? You might want to check out The Kimball Group's Data Warehouse Toolkit or website to ensure you get your dimensions and facts built correctly.
You have your data in "staging" meaning you have imported your raw data into SQL Server. You will need to create dimension tables with surrogate keys (just auto-incremented identity values) and then create fact tables that use these surrogate keys as foreign keys. You could probably do all of this in straight SQL, but this is what SSIS is for. Once you have your facts and dimensions defined and populated, best practice is to create views to use in the DSV for your cube.
Once you have your views populated and in your DSV in SSAS, you will build the dimensions and facts and then relate them in the cube. If you define the relationships in the DSV, the relationships will be mostly populated in the Dimension usage tab for you.