I have a need to build a schema structure to support table of contents (so the level of sections / sub-sections could change for each book or document I add)...one of my first thoughts was that I could use a recursive table to handle that. I want to make sure that my structure is normalized, so I was trying to stay away from deonormalising the table of contents data into a single table (then have to add columns when there are more sub-sections).
It doesn't seem right to build a recursive table and could be kind of ugly to populate.
Just wanted to get some thoughts on some alternate solutions or if a recursive table is ok.
Thanks,
S
It helps that SQL Server 2008 has both the recursive WITH clause and hierarchyid to make working with hierarchical data easier - I was pointing out to someone yesterday that MySQL doesn't have either, making things difficult...
The most important thing is to review your data - if you can normalize it to be within a single table, great. But don't shoehorn it in to fit a single table setup - if it needs more tables, then design it that way. The data & usage will show you the correct way to model things.
When in doubt, keep it simple. Where you've a collection of similar items, e.g. employees then a table that references itself makes sense. Whilst here you can argue (quite rightly) that each item within the table is a 'section' of some form or another, unless you're comfortable with modelling the data as sections and handling the different types of sections through relationships to these entities, I would avoid the complexity of a self-referencing table and stick with a normalized approach.
Related
Here is the problem: I am currently working with a Microsoft Access database that the previous employee created by just adding all the data into one table (yes, all the data into one table). There are about 186 columns in that one table.
I am now responsible for dividing each category of data into its own table. Everything is going fine although progress is too slow. Is there perhaps an SQL command that will somehow divide each category of data into its proper table? As of now I am manually looking at the main table and carefully transferring groups of data into each respective table along with its proper IDs making sure data is not corrupted. Here is the layout I have so far:
Note: I am probably one of the very few at my campus with database experience.
I would approach this as a classic normalisation process. Your single hugely wide table should contain all of the entities within your domain so as long as you understand the domain you should be able to normalise the structure until you're happy with it.
To create your foreign key lookups run distinct queries against the columns your going to remove and then add the key values back in.
It sounds like you know what you're doing already ? Are you just looking for reassurance that you're on the right track ? (Which it looks like you are).
Good luck though, and enjoy it - it sounds like a good little piece of work.
I need to create a sql stored procedure (Sql Server 2008 - T-SQL) which copies a node in an adjacency model.
Table can be seen as having two columns, Id and ParentId (FK to Id). Copying means that also all subordinates need to be copied.
I think that using WITH is a good start, but I'm curious if I can do this copy without using Cursors.
The fundamental problem with adjacency lists is there is no general way in SQL to extract an entire sub tree, so you already have a problem of identifying all the rows you need to duplicate without resorting to a cursor.
If possible migrate your adjacency list to a nested set model which allows you to easily identify all the nodes of a subtree. However, the maintenance of a nested set model is more complex for general inserts and deletes.
EDIT: As pointed out by 'a_horse_with_no_name' there is a way in general SQL to process adjacency lists, recursive common table expressions.
Copying a whole sub-tree is a bit of a problem because when you copy your sub-tree you are either
denormalizing data or
using it as a template of some sorts.
In either case you are dragging data through inconsistent state at some point - which indicates some problems with your design (for example do your records need to have multiple parents or not? if yes, then you should consider redesigning).
So, you should update the answer with a more complete example of what you are trying to do.
One solution would be to have a temporary table, selecting for the insert should not be a problem, it is just updating the referenced IDs that would be a problem.
So
WITH INSERT into temporary table
UPDATE the IDs
INSERT into original table
DELETE temp records
The procedure needs to go like this because it would be hard to change the IDs (both record IDs and ID referring to parent) in initial WITH INSERT. However it might be possible, if there was a nice function that depended only on max_id or only on old IDs.
This question is an attempt to find a practical solution for this question.
I need a semi-schema less design for my SQL database. However, I can limit the flexibility to shoehorn it into the entire SQL paradigm. Moving to schema less databases might be an option in the future but right now, I' stuck with SQL.
I have a table in a SQL database (let's call it Foo). When an row is added to this, it needs to be able to store an arbitrary number of "meta" fields with this. An example would be the ability to attach arbitrary metadata like tags, collaborators etc. All the fields are optional but the problem is that they're of different types. Some might be numeric, some might be textual etc.
A simple design linking Foo to a table of OptionalValues with fields like name, value_type, value_string, value_int, value_date etc. seems direct although it descends into the whole EAV model which Alex mentions on that last answer and it looks quite wasteful. Also, I imagine queries out of this when it grows will be quite slow. I don't expect to search or sort by anything in this table though. All I need is that when I get a row out of Foo, these extra attributes should be obtainable as well.
Are there any best practices for implementing this kind of a setup in a SQL database or am I simply looking at the whole thing wrongly?
Add a string column "Metafields" to your table "Foo" and store your metadata there as an XML or JSON string.
I've got a database table that represents a bunch of trees. The first three columns are GUIDs that look like this:
NODE_ID (PK)
PARENT_NODE_ID (FK to same table, references NODE_ID)
TREE_ID (FK to another table)
It's possible to move a node to a different tree. The tricky part is bringing all its child-nodes with it. That takes a recursive update. (And yes, I realize this is kinda bad design in the first place. I didn't design it. I just have to maintain it, and I can't change the database schema.)
It would be nice if I could do the update in SQL, as a stored procedure. But I can't think of how to implement the recursive operation required in set logic, without employing a cursor. Does anyone know of a reasonably simple way to pull this off?
If you are using Postgres or MS SQL 2005 you can use a recursive update, otherwise, you may want to consider using a method other than an adjacency list. I saw a presentation a few weeks ago speaking about these issues and storing hierarchical data. Here is a link:
http://www.slideshare.net/billkarwin/practical-object-oriented-models-in-sql
Start # slide 40
I have a data warehouse containing typical star schemas, and a whole bunch of code which does stuff like this (obviously a lot bigger, but this is illustrative):
SELECT cdim.x
,SUM(fact.y) AS y
,dim.z
FROM fact
INNER JOIN conformed_dim AS cdim
ON cdim.cdim_dim_id = fact.cdim_dim_id
INNER JOIN nonconformed_dim AS dim
ON dim.ncdim_dim_id = fact.ncdim_dim_id
INNER JOIN date_dim AS ddim
ON ddim.date_id = fact.date_id
WHERE fact.date_id = #date_id
GROUP BY cdim.x
,dim.z
I'm thinking of replacing it with a view (MODEL_SYSTEM_1, say), so that it becomes:
SELECT m.x
,SUM(m.y) AS y
,m.z
FROM MODEL_SYSTEM_1 AS m
WHERE m.date_id = #date_id
GROUP BY m.x
,m.z
But the view MODEL_SYSTEM_1 would have to contain unique column names, and I'm also concerned about performance with the optimizer if I go ahead and do this, because I'm concerned that all the items in the WHERE clause across different facts and dimensions get optimized, since the view would be across a whole star, and views cannot be parametrized (boy, wouldn't that be cool!)
So my questions are -
Is this approach OK, or is it just going to be an abstraction which hurts performance and doesn't give my anything but a lot nicer syntax?
What's the best way to code-gen these views, eliminating duplicate column names (even if the view later needs to be tweaked by hand), given that all the appropriate PK and FKs are in place? Should I just write some SQL to pull it out of the INFORMATION_SCHEMA or is there a good example already available.
Edit: I have tested it, and the performance seems the same, even on the bigger processes - even joining multiple stars which each use these views.
The automation is mainly because there are a number of these stars in the data warehouse, and the FK/PK has been done properly by the designers, but I don't want to have to pick through all the tables or the documentation. I wrote a script to generate the view (it also generates abbreviations for the tables), and it works well to generate the skeleton automagically from INFORMATION_SCHEMA, and then it can be tweaked before committing the creation of the view.
If anyone wants the code, I could probably publish it here.
I’ve used this technique on several data warehouses I look after. I have not noticed any performance degradation when running reports based off of the views versus a table direct approach but have never performed a detailed analysis.
I created the views using the designer in SQL Server management studio and did not use any automated approach. I can’t imagine the schema changing often enough that automating it would be worthwhile anyhow. You might spend as long tweaking the results as it would have taken to drag all the tables onto the view in the first place!
To remove ambiguity a good approach is to preface the column names with the name of the dimension it belongs to. This is helpful to the report writers and to anyone running ad hoc queries.
Make the view or views into into one or more summary fact tables and materialize it. These only need to be refreshed when the main fact table is refreshed. The materialized views will be faster to query and this can be a win if you have a lot of queries that can be satisfied by the summary.
You can use the data dictionary or information schema views to generate SQL to create the tables if you have a large number of these summaries or wish to change them about frequently.
However, I would guess that it's not likely that you would change these very often so auto-generating the view definitions might not be worth the trouble.
If you happen to use MS SQL Server, you could try an Inline UDF which is as close to a parameterized view as it gets.