I'm am a big fan of the data lineage graph generated by dbt.
However, this only shows the relationships between tables & views created by dbt. Is there a way to show the dependencies to "original" tables in the database?
Say, if my dbt model creates a view of called open_orders as
SELECT * FROM orders WHERE status='Open'
I'd like to have the data lineage graph show that open_orders depends on orders.
You could use DBT Sources for that. If you create a source of your orders table, it will appear in the data lineage.
Related
I'm trying to Architecture creating a data warehouse in the Star Schema model... any idea would be appreciated.
Any idea what I should do to create a Star Schema? Some day that I should have a linking table with DimProjects going to the fact tables. What about Project hours? What is the right approach to this or do I need other tables to link? Employee's can work on multiple projects, projects require man hours... etc.
What is the best approach on modeling?
So far I have tables:
[CODE]
Dimension Tables Measure Tables
---------------- --------------
DimEmployee FactCRM
DimProjects FactTargets
DimSalesDetails FactRevenue
DimAccounts
DimTerritories
DimDate
DimTime
[/CODE]
Dimensions in a schema of a datewarehouse means independent entities like for say
Dim_Employee
Empid(pk)
Name
Address etc likewise all other
dimensions
With each dimension keys linked to your fact like in above case
FactCRM would include only crm
related measures and would be linled
To their specific dimensions depending
upon the requirements
Without knowing the columns noone would be able to tell what you want in actual. Also remember linking a dimension to a fact is obviously a partial star schema itself so that doesnt lead to any issues. The only thing is if your dimensions are itself normalized in a schema then it becomes snowflake.
Another thing about fact related if you want to perform manipulation of othwr facts based on somw existing facts then you have to link fact table as well with a unique factid. This is called fact constellation. Then the schema would become star/snowflake schema with facy constellation
I have the following situation: there are 2 tables in my SQLite database one of which is called "assets" and the other "operations".
"Assets" include a list of market assets each of which having a specific "Asset type" value alongisde its name.
"Operations" have lots of columns one of which being "asset"; each operation is done with one single asset.
Now I have a table in my app I want to populate with "all operations related to a specific asset type", something that needs to be done quick. My question is: how can I do this in an both fast and elegant way?
Two ways I have now, but both of them are inadequate: one is simply to add a new row called "asset type" in the operations table making the search pretty straightforward. The problem is that this is far from elegant since there is no direct connection between operations and asset types. Another solution would be to first call for a list of assets of some type and then do the look in "operations" table only for "assets included on this list". But this would be a processing monster since there could be dozens to hundreds of assets per type.
Is there any other way I couldn't figure or find out?
Just do it in the obvious way:
SELECT *
FROM Operations
WHERE AssetID IN (SELECT AssetID
FROM Assets
WHERE AssetType = ?);
The same could be done with a join
SELECT Operations.*
FROM Operations
JOIN Assets USING (AssetID)
WHERE AssetType = ?;
I'm new to that topic. I've got a database with a flat fact table, which contain data like date, product group, product subgroup, product actual name, and some calculations/statistics. All I need to do is create a report using olap cube. I have got two ideas how to create that, but dont know which draft is better (if even correct). The original DAILY_REPORT... table has not a primary key. Its just a data table. In first concept I have created every table (which will be as a dimension) with a ID, and connected the product->family of product->project->building in a hierarchy. Another concept is without all ID's and hierarchy. Relation created automatically based on names. Can somebody explain me in which direction I should tend...?
First idea:
http://imgur.com/iKNfAXF
Second:
http://imgur.com/IZjW1W6
Thanks in advance!
You can follow these steps to create your cube:
Create a separate view for each of the dimensions you want to have. Group similar type of data in one view, for e.g. Product Name, Product Group, Product Sub-Group, etc.
Keep the data in your dimension view as DISTINCT data. for e.g. SELECT DISTINCT [Product Name], [Product Group], [Product Sub-Group] FROM TABLE
Keep an 'ID' column in each dimension view, for e.g. Product ID in Product view
Create a view for your fact. Include 'ID' column of each dimension in your Fact view. This will help you to create relationship on 'ID' column, which will be a lot faster than relationship created on top of names.
For creating hierarchies in dimension attributes, SSAS provide drag and drop functionality.
If you need more details let me know.
You could construct the dimensions you need by views that based on distinct queries (i.e. SELECT DISTINCT) from the source data. These can be used to populate the dimensions.
You can make a synthetic date dimension fairly easily.
Then you can create a DSV that joins the views back against the fact table to populate the measure group.
If you need to fake a primary key then you can use a view that annotates the fact table with a column generated from row_number() or some similar means. Note that this is not necessarily stable across runs, so you can't rely on it for incremental loads. However, it would work fine for complete refreshes.
I have imported my flatfiles to SQL Server 2012 and created few tables (source tables). I need to build a cube in SSAS. But I need to make "dimension" and "fact" tables it seems with proper PK/FK relations. Could someone tell me whether I need to do:
create an empty dimABC, dimXYZ tables manually with PK identified?
copy data from source tables (imported above) into this new dimXXX tables through some SQL query?
then create a new factXXX table and copy the required facts(data) from source tables above.
Then I need to use these tables during cube build process.
I appreciate your help in clarifying my steps 1,2,3.
You're pretty close on your steps. It sounds like you are new to data warehousing? You might want to check out The Kimball Group's Data Warehouse Toolkit or website to ensure you get your dimensions and facts built correctly.
You have your data in "staging" meaning you have imported your raw data into SQL Server. You will need to create dimension tables with surrogate keys (just auto-incremented identity values) and then create fact tables that use these surrogate keys as foreign keys. You could probably do all of this in straight SQL, but this is what SSIS is for. Once you have your facts and dimensions defined and populated, best practice is to create views to use in the DSV for your cube.
Once you have your views populated and in your DSV in SSAS, you will build the dimensions and facts and then relate them in the cube. If you define the relationships in the DSV, the relationships will be mostly populated in the Dimension usage tab for you.
When I create a view I can base it on multiple columns from different tables.
When I want to create a lookup table I need information from one table, for example the foreign key of an order table, to get customer details from another table. I can create a view having parameters to make sure it will get all data that I need. I could also - from what I have been reading - make a lookup table. What is the difference in this case and when should I choose for a lookup table?? I hope this ain't a bad question, I'm not very into db's yet ;).
Creating a view gives you a "live" representation of the data as it is at the time of querying. This comes at the cost of higher load on the server, because it has to determine the values for every query.
This can be expensive, depending on table sizes, database implementations and the complexity of the view definition.
A lookup table on the other hand is usually filled "manually", i. e. not every query against it will cause an expensive operation to fetch values from multiple tables. Instead your program has to take care of updating the lookup table should the underlying data change.
Usually lookup tables lend themselves to things that change seldomly, but are read often. Views on the other hand - while more expensive to execute - are more current.
I think your usage of "Lookup Table" is slightly awry. In normal parlance a lookup table is a code or reference data table. It might consist of a CODE and a DESCRIPTION or a code expansion. The purpose of such tables is to provide a lsit of permitted values for restricted columns, things like CUSTOMER_TYPE or PRIORITY_CODE. This category of table is often referred to as "standing data" because it changes very rarely if at all. The value of defining this data in Lookup tables is that they can be used in foreign keys and to populate Dropdowns and Lists Of Values.
What you are describing is a slightly different scenario:
I need information from one table, for
example the foreign key of an order
table, to get customer details from
another table
Both these tables are application data tables. Customer and Order records are dynamic. Now it is obviously valid to retrieve additional data from the Customer table to display along side the Order data, and in that sense Customer is a "lookup table". More pertinently it is the parent table of Order, because it has the primary key referenced by the foreign key on Order.
By all means build a view to capture the joining logic between Order and Customer. Such views can be quite helpful when building an application that uses the same joined tables in several places.
Here's an example of a lookup table. We have a system that tracks Jurors, one of the tables is JurorStatus. This table contains all the valid StatusCodes for Jurors:
Code: Value
WS : Will Serve
PP : Postponed
EM : Excuse Military
IF : Ineligible Felon
This is a lookup table for the valid codes.
A view is like a query.
Read this tutorial and you may find helpful info when a lookup table is needed:
SQL: Creating a Lookup Table
Just learn to write sql queries to get exactly what you need. No need to create a view! Views are not good to use in many instances, especially if you start to base them on other views, when they will kill performance. Do not use views just as a shorthand for query writing.