I've got 3 fact tables each having a dimension called Category. Lets say Fact1-Category1, Fact2-Category2 and Fact3-Category3. Now Category1 and Category2 may have some common values, category1 and category3 have some common values and so on. Now I want to map all the categories with all the fact tables. For example if value "Homer" exist for cat1, cat2 and cat3, so whenever I select any of the 3 categories as filter Fact values should appear accordingly. I'm trying to implement reference relationship here, but its not working.
I would collapse the 3 dimensions into a common "Category" dimension, and relate it to each Fact.
Related
I have a cube with the structure of n-to-n relationship to categories.
Each product can belong to many categories. In this scenario my product table looks like this:
There is one product in 3 different categories:
Fashion
Men
Women
Electronic
The Bridge_ProductCategories has 3 rows for mapping each category to this product. So I have 3 records in FactSales table showing the sales amount of this product.
In cube browsing when I filter the "Electronic" category, everything works fine.
But when I filter "Fashion" category, because of the belonging to 2 subcategories, the sales amount will be duplicated.
Does anyone have any solution for this situation?
i am trying to find the best DB design for the following problem.
i have 20000 data sets which look like this:
1. id, name, color, width, xxxx ... 150 attributes
2. id, name, color, width, xxxx ... 150 attributes
3. ...
which means that i have 20000 entities and 150 attributes like color, width etc for each one of them.
i need all these attributes and maybe 15 are being used more than others. this is being used in a web application and it has to perform.
solutions i thought about:
normalized two tables approach:
id, name and a few "more important" attributes in one main table
in another table (one-to-one relation): id and other less important attributes, every one of the in different column
everything in one monster table:
id, name, color, width ...
normalized two table approach (one-to-many):
main table with: id and name
in another table (one-to-many relation): other table with: id, attr_name, value
i like [3] most but i am not sure is this going to perform if i need a lot of data because every "id" has 150 values. and i would have to things like
SELECT mt.id, mt.name, at.attr_name, at.value
FROM main_table mt
INNER JOIN attr_table at ON at.id = mt.id
AND at.attr_name IN ('width', 'color', 'a', 'b', 'c' .....)
AND at.id IN (1,3,9...)
ORDER BY 1
having maybe 15-20 different values in "attr_name IN (...)" does not look optimal. and if i need 10-30 different datasets (which i usually do), it looks even less appealing.
output of this would be probably 200-300 lines and I would have to normalize this output in the code.
[2] is pretty dirty and simple but i am not sure how does it perform. and having 150 columns in one monster table also does not look optimal.
i like on this approach that i can do a lot of stuff in sql and not later in code like: attr1 - attr2 ... (like "max_width - width" or weight - max_weight/4).
[1] i don't like this one because it does not look clean to have "some" attributes in one table and all other attributes of the same type in an another
What is the best solution for this specific problem?
I found some similar but not same questions:
Best to have hundreds of columns or split into multiple tables?
Is it better to have many columns, or many tables?
With so few rows as 20,000 I would have no doubt about going full normalized. In my opinion even with much more rows I would still do it in instead of the additional JSON's full new set of problems and weaknesses.
output of this would be probably 200-300 lines and I would have to normalize this output in the code
Create a view to not have to do the join at every query
create view the_view as
select id, mt.name, at.attr_name, at.value
from
main_table mt
inner join
attr_table at using (id)
Filter when selecting
select *
from the_view
where
attr_name = 'color' and "value" = 'red'
and
attr_name = 'width' and "value" = '30'
and
id in (1, 3, 9)
150 columns suggest that the attributes list is not stable. If you create a table with 150 columns you will have to be always altering the table to add new columns. The normalized approach is flexible. You create attributes at will simple by adding a row to a table.
There should be 3 tables; main_table, main_table_attr_table and attr_table. The main_table_attr_table is a n to n connection between the other two.
I have multiple models (events, chores, bills, and lists), which each have their own table. I want to be able to group any of these instances together, for example group an event with a list of items to buy for it, and a bill for the cost.
I was thinking each table could have a group id, and I could get other items in a group by merging records from each table where the group_id equals the items group_id.
group = Events.find_by_group_id(self.group_id).concat(Bills.find_by_group_id(self.group_id)) ...
But that seems like a bad way to do it.
Another way I thought to do it was to use a polymorphic relation between two of the items
tag
item_1_id | item_1_type | item_2_id | item_2_type
----------+-------------+-----------+------------
But in the example above (a group of three different items) would require six records, two between each pair, for each item to know of all other items in the group.
Is there a way to do this with joins, should I redesign some of the tables?
How to I represent the following datamodel in sql tables.
I have 3 entities, company, productcategory and product.
Business model is that Company can have product category1-N and each category can have many products.
The trick is that products are shared across companies under different categories. Product categories are not shared. Each company has its own categories.
for example,
product1 belongs to category1 under company1
product1 belongs to category2 under company2
I'm thinking of having the following tables. Only relevant Id fields are shown below.
Company
CompanyId
ProductCategory
ProductCategoryId
CompanyId
ParentCategoryId (To support levels)
Product
ProductId
ProductCategoryXProduct
ProductCategoryId
ProductId
This way I can query for all product categories for a product and filter by company to get the specific category structure for its products. This may be different for another company even if the product is the same.
Will this cover it? is there a better approach?
Looks like a fine 3NF design that fits what you have described.
Note that as your data set will grow, this design will start slowing down (mostly due to the required joins), so when the time comes you may want to denormalize some of these tables for faster reads.
Assuming you have the need for products to belong to multiple categories I think that this structure is fine.
I've been doing quite a bit of searching, but haven't been able to find many resources on the topic. My goal is to store scheduling data like you would find in a Gantt chart. So one example of storing the data might be:
Task Id | Name | Duration
1 Task A 1
2 Task B 3
3 Task C 2
Task Id | Predecessors
1 Null
2 Null
3 1
3 2
Which would have Task C waiting for both Task A and Task B to complete.
So my question is: What is the best way to store this kind of data and efficiently query it? Any good resources for this kind of thing? There is a ton of information about tree structures, but once you add in multiple parents it becomes hard to find info. By the way, I'm working with SQL Server and .NET for this task.
Your problem is related to the concept of relationship cardinality. All relationships have some cardinality, which expresses the potential number of instances on each side of the relationship that are members of it, or can participate in a single instance of the relationship. As an example, for people, (for most living things, I guess, with rare exceptions), the Parent-Child relationship has a cardinality of 2 to zero or many, meaning it takes two parents on the parent side, and there can be zero or many children (perhaps it should be 2 to 1 or many)
In database design, generally, anything that has a 1(one), (or a zero or one), on one side can be easily represented with just two tables, one for each entity, (sometimes only one table is needed see note**) and a foreign key column in the table representing the "many" side, that points to the other table holding the entity on the "one" side.
In your case you have a many to many relationship. (A Task can have multiple predecessors, and each predecessors can certainly be the predecessor for multiple tasks) In this case a third table is needed, where each row, effectively, represents an association between 2 tasks, representing that one is the predecessor to the other. Generally, This table is designed to contain only all the columns of the primary keys of the two parent tables, and it's own primary key is a composite of all the columns in both parent Primary keys. In your case it simply has two columns, the taskId, and the PredecessorTaskId, and this pair of Ids should be unique within the table so together they form the composite PK.
When querying, to avoid double counting data columns in the parent tables when there are multiple joins, simply base the query on the parent table... e.g., to find the duration of the longest parent,
Assuming your association table is named TaskPredecessor
Select TaskId, Max(P.Duration)
From Task T Join Task P
On P.TaskId In (Select PredecessorId
From TaskPredecessor
Where TaskId = T.TaskId)
** NOTE. In cases where both entities in the relationship are of the same entity type, they can both be in the same table. The canonical (luv that word) example is an employee table with the many to one relationship of Worker to Supervisor... Since the supervisor is also an employee, both workers and supervisors can be in the same [Employee] table, and the realtionship can gbe modeled with a Foreign Key (called say SupervisorId) that points to another row in the same table and contains the Id of the employee record for that employee's supervisor.
Use adjacency list model:
chain
task_id predecessor
3 1
3 2
and this query to find all predecessors of the given task:
WITH q AS
(
SELECT predecessor
FROM chain
WHERE task_id = 3
UNION ALL
SELECT c.predecessor
FROM q
JOIN chain c
ON c.task_id = q.predecessor
)
SELECT *
FROM q
To get the duration of the longest parent for each task:
WITH q AS
(
SELECT task_id, duration
FROM tasks
UNION ALL
SELECT t.task_id, t.duration
FROM q
JOIN chain с
ON c.task_id = q.task_id
JOIN tasks t
ON t.task_id = c.predecessor
)
SELECT task_id, MAX(duration)
FROM q
Check "Hierarchical Weighted total" pattern in "SQL design patterns" book, or "Bill Of Materials" section in "Trees and Hierarchies in SQL".
In a word, graphs feature double aggregation. You do one kind of aggregation along the nodes in each path, and another one across alternative paths. For example, find a minimal distance between the two nodes is minimum over summation. Hierarchical weighted total query (aka Bill Of Materials) is multiplication of the quantities along each path, and summation along each alternative path:
with TCAssembly as (
select Part, SubPart, Quantity AS factoredQuantity
from AssemblyEdges
where Part = ‘Bicycle’
union all
select te.Part, e.SubPart, e.Quantity * te.factoredQuantity
from TCAssembly te, AssemblyEdges e
where te.SubPart = e.Part
) select SubPart, sum(Quantity) from TCAssembly
group by SubPart