modelling hierarchical data warehouse dimension

modelling hierarchical data warehouse dimension - sql

I'm trying to model a dimension that is hierarchical and has an indeterminate amount of features at each level. Here's an example on how the tables are structured in the data source
+-------------+ +-------------+ +-------------+
|Product +--------------+SubCategory +----------------+Category |
+------+------+ +------+------+ +-------+-----+
| | |
| | |
+------+----------+ +-------+-------------+ +-------+----------+
|Product Features | |SubCategory Features | |Category Features |
+-----------------+ +---------------------+ +------------------+
There's a one to many relationship from Product -> SubCategory and from SubCategory -> Category. Each of the Product/SubCategory/Category also reference their respective features table. The number of features is however not fixed and could be 0.
The fact tables I'm trying to build needs to be at the level of product for its grain, and without the features I could just make a dimension with each of these as columns like so:
+-----------------+
|Dim_Product |
+-----------------+
|Dim_Product_Id |
|Product |
|SubCategory |
|Category |
|... |
+-----------------+
But then all the features would be lost.
Is it possible to keep the dimension at the product level and keep all the features from each hierarchy? Or would it be necessary to make a bridge table containing all the combinations of all the features in the hierarchies? Would I need to break the levels out into their own dimension (i.e. Dim_Product, Dim_SubCategory, Dim_Category) instead? There is also fixed attributes for each heirarchy level, so can these be just flattened out and included as columns if a single dimension is a suitable option?

It seems tricky, since you don't know exactly the number of features, this is my suggestion, though the bridge table would be quite big.
+-----------------+
|Dim_Product |
+-----------------+
|Dim_Product_Id |
|Product |
|SubCategory |
|Category |
|... |
+-----------------+
|
|
+-----------------+
|BridgeTable |
+-----------------+
|Dim_Product_Id |
|Feature_Id |
+-----------------+
|
|
+-----------------+
|Features |
+-----------------+
|Feature_Id |
|FeatureDescription
|TypeOfFeature |
+-----------------+
with TypeOfFeature being one of (ProductFeature, SubCategory, Category).

Related

How can I get a list of any UDFs referenced by a query job?

I know that from the INFORMATION_SCHEMA views it's possible to find the tables a query job referenced, but I'm unable to figure out a way to find a way to get the list UDFs that the same kind of job referenced.
Is there a way to get a relationship of Job <-> UDFs like this?
+========+=====================+
| Job Id | UDFs |
+--------+---------------------+
|job1 | myproject.udf.UDF_a |
| | myproject.udf.UDF_b |
+--------+---------------------+
|job2 | myproject.udf.UDF_b |
| | myproject.udf.UDF_c |
+--------+---------------------+
|job3 | myproject.udf.UDF_c |
+========+=====================+

How to create Header over Header

Table:
id | BL | ML |BL | ML
---------------------------------------------------------
1 | Field01 |Name | Field34 | Field36
2 | Field02 |Age | Field35 | Field37
Required result:
Id | Open | Closed
---------------------------------------------------------
| BL | ML |BL | ML
---------------------------------------------------------
1 | Field01 |Name | Field34 | Field36
2 | Field02 |Age | Field35 | Field37

This is not possible in just SQL Server. You should be handling this type of display within your presentation tool.

It seems weird to do this kind of thing in SQL Server. I don't think you can even do this in SQL Server. If you have access to Python, you can use Python to do this kind of thing, which is called a MultiIndex. This may be going off on a tangent, but you can refer to the link below as to how to create a MultiIndex.
https://jakevdp.github.io/PythonDataScienceHandbook/03.05-hierarchical-indexing.html

RDBMS schema for unknown columns

I have a project with a MySQL database, and I would like to be able to upload various datasets. Say I am building a restaurant reviews aggregator. So we would like to keep adding all sources of restaurant reviews we could get our hands on, and keeping all the information.
I have a table review_sources
=========================
| id | name |
=========================
| 1 | Zagat |
| 2 | GoodEats Magazine|
| ... |
| 50 | Allergy News |
=========================
Now say I have a table reviews
=====================================================================
| id | Restaurant Name | source_id | Star Rating | Description |
=====================================================================
| 0 | Joey's Burgers | 1 | 3.5 | Wow! |
| 1 | Jamal's Steaks | 1 | 3.5 | Yummy! |
| 2 | Jenny's Crepes | 1 | 4.5 | Sweet! |
| .... |
| 253| Jeeva's Curries | 3 | 4 | Spicy! |
=====================================================================
Now suppose someone wants to add reviews from "Allergy News", they have a field "nut-free". Or a source of reviews could describe the degree of kashrut compliance, or halal compliance or vegan-friendliness. I as a designer don't know the possible optional fields future data sources may have. I want to be able to answer queries:
What are all the fields in the Zagat reviews?
For review id=x, what is value of the optional field "vegan-friendly"?
So how do I design a schema that can handle these disparate data sources and answer these queries? My reasons for not going for NoSQL are that I do want certain types of normalization, and that this is part of an existing MySQL based project.

I'd use a many-to-many relationship with a table containing a review_id, a field (e.g. "vegan-friendly") and the value of the field. Then of course a reviews_fields table to map one to the other.
Cheers

Database design affected by UI needs: Should I make an extra field or a new relation?

I have this relation:
1 Incident Report has N Documents. The Incident Report could have a field like PupilName backed by a PupilID.
OR
I could change the design to:
1 Pupil has N Incident Reports etc...
My concern is that I do not like the extra table because in my GUI I used a simple grouped
DataGrid for the Incident Reports. When I have now still a Pupil Entity just to display the
PupilsName in the grouped Header...and the Incident Reports below in the Datarow. Of course
that would not be possible as I can not display a 1:N relation in a DataGrid!
What would you do?

Lisa, I see no design problems in the question.
Pupil IncidentReport Document
+---------+ +------------------+ +------------------+
| PupilID | | IncidentReportID | | DocumentID |
+---------+ +------------------+ +------------------+
| Name | -|---<- | PupilID | -|---<- | IncidentReportID |
| ... | | ... | | ... |
+---------+ +------------------+ +------------------+
Although I may not quite understand the question.

unique constraint (w/o Trigger) on "one-to-many" relation

To illustrate the problem, I make an example:
A tag_bundle consists of one or more than one tags.
A unique tag combination can map to a unique tag_bundle, vice versa.
tag_bundle tag tag_bundle_relation
+---------------+ +--------+ +---------------+--------+
| tag_bundle_id | | tag_id | | tag_bundle_id | tag_id |
+---------------+ +--------+ +---------------+--------+
| 1 | | 100 | | 1 | 100 |
+---------------+ +--------+ +---------------+--------+
| 2 | | 101 | | 1 | 101 |
+---------------+ +--------+ +---------------+--------+
| 102 | | 2 | 101 |
+--------+ +---------------+--------+
| 2 | 102 |
+---------------+--------+
There can't be another tag_bundle having exactly the same combination from tag 100 and tag 101.
There can't be another tag_bundle having exactly the same combination from tag 101 and tag 102.
How can I ensure such unique constraint when executing SQL "concurrently"!!
that is, to prevent concurrently adding two bundles with exactly the same tag combination
Adding a simple unique constraint on any table does not work,
Is there any solution other than Trigger or explicit lock.
I come to only this simple way: make tag combination into string, and let it be a unique column.
tag_bundle (unique on tags) tag tag_bundle_relation
+---------------+-----------+ +--------+ +---------------+--------+
| tag_bundle_id | tags | | tag_id | | tag_bundle_id | tag_id |
+---------------+-----------+ +--------+ +---------------+--------+
| 1 | "100,101" | | 101 | | 1 | 101 |
+---------------+-----------+ +--------+ +---------------+--------+
| 100 | | 1 | 100 |
+--------+ +---------------+--------+
but it seems not a good way :(

Why the constraint of 'without a trigger'? With it, combined with a bit of data duplication, you can get what you need. Change your 'tags' field in your solution to an array field of INTEGERs (or whatever type tag_id is)
While recognising the unpleasantness of the solution, I don't see a way round it. Though I would use an array instead of a string for 'tags', put it in a separate table from tag_bundle, still make it unique and put a trigger on tag_bundle_relation to update the tags field with array_agg(tag_id) (>8.4), and if that fails, fail the trigger update.

In order to work correctly when multiple transactions will be updating the tables, you will need to create a deferable, initially deferred, constraint trigger.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

modelling hierarchical data warehouse dimension - sql

Related

How can I get a list of any UDFs referenced by a query job?

How to create Header over Header

RDBMS schema for unknown columns

Database design affected by UI needs: Should I make an extra field or a new relation?

unique constraint (w/o Trigger) on "one-to-many" relation

Categories

Resources