How to design a database schema with type and subtype - sql

I've read plenty of supertype/subtype threads and I'm pretty sure I am not asking the same one.
I have the following tables in my database. Note that:
1. Some security types only need Type but require no SubType, such as stocks and bonds.
2. Securties.TypeId is a foreign key pointing to Type.ID.
3. Securties.SubTypeId has no foreign key relationship to BondType or DerivativeType tables. And currently the data integrity is maintained by C# code.
Since lacking of foreign key relationship is bad, I want to refactor this DB to have it. Given that this DB is already in production, what's the best way to improve it while limiting the software risk? i.e., one way to do it is to combine all XXXType tables into a single table and have all SubTypeIds rearranged, but clearly that involves updating tons of records in the Securites table. So it's considered a more risky approach than another one which doesn't require changing values.
[Securites]
ID Name TypeId SubTypeId
1 Stock1 2 NULL
2 Fund1 3 NULL
3 Bond1 1 3
4 Deriv1 4 3
[Type]
ID Name
1 Bond
2 Stock
3 ETF
4 Derivative
[BondType]
ID Name
...
2 GovermentBond
3 CorporateBond
4 MunicipalBond
...
[DerivativeType]
ID Name
...
2 Future
3 Option
4 Swap
...

Related

Conditional Column on sql

I'm trying to create a new table on my DB, the table has 2 important columns
id_brands (This is an FK from the table brands)
id_veiculo
What I would like to have is something like this:
id_brands
id_veiculo
1
1
1
2
2
1
2
2
3
1
1
3
3
2
I create the table but I'm trying to find a way to make this condition with a trigger but without success, I don't know if it's possible or if a trigger is the best way to do that.
What you are probably trying to do, by the pattern of the example table, is setting up an auxiliary N to N relationship table.
In this case, by having another table, for id_veiculo and its properties, you will be able to have both ids as FKs. As for the primary key in this auxiliary table, it would be both id_brands and id_veiculo:
PRIMARY KEY (id_veiculo, id_brands);
Here's another Stackoverflow question about NxM/NxN relationships.
Also, it isn't very clear what you're trying to do with the table, but if it's the population/seeding of data, then yes, a Trigger is an viable solution.

Surrogate Keys need to be ordered by natural key

Just wanted to know whether surrogate keys need to be ordered explicitly on the natural key? Does the Same surrogate key always need to be binded with the natural key when a truncate and reload is used? or does it not matter, I have some big tables that use delta load which basically just does inserts and updates. I don't want to be ordering the data to ensure surrogate key and natural key always bind if they don't need to? Isnt that why they are nonsense keys?
The actual numeric value of a surrogate key has nothing to do with the natural key or other fields in the record. That said, once you assign a surrogate key to a record, you should never break that link or risk leaving orphaned fact records in your data.
You can see this most clearly in a slowly changing dimension table that has multiple versions of some natural keys.
sur_key nat_key description version valid_from valid_through
1 105 UK Office 1 1900-01-01 2017-02-16
2 108 FR Office 1 1900-01-01 2099-12-31
3 109 NL Office 1 1900-01-01 2099-12-31
4 105 UK/IRL Office 2 2017-02-16 2099-12-31
5 102 DK Office 1 1900-01-01 2099-12-31
As you can see, a new version of natural key 105 just gets the next surrogate key and the old record stays in place. A late arriving key 102 also just gets the next key.
Any ordering of natural keys only happens in an index on that column, never in the table itself.
Surrogate keys and natural keys generally should not have any direct relationship. It would be a maintenance nightmare to try to keep them aligned and you would be constantly having to re-assign the keys as new data gets added.
After a truncate and reload of your key table, your dim records may end up with a different SK, requiring your fact record to be updated/reloaded as well.
If that is a recurring scenario, you can include your natural key in the fact table. It does take more space, but it makes reloads and troubleshooting easier.
Inmon described a data warehouse as being a subject-oriented, integrated, time-variant and nonvolatile collection of data .
For implement integrated we have to understand concept of surrogate keys .
ie . We have to create DWH for an group of retail shop , so we have to pull data Item detail data across different shop.
Shope 1
Item
Itemid Name
1 Tea
2 suger
Sales
orderid Itemid tola
1 1 100
2 1 100
3 2 300
Shope 2
Item
Itemid Name
1 cofee
2 tea
Sales
orderid Itemid tola
1 1 100
2 1 100
3 2 300
No itemid 1 may have diffrent item across shopes , when we pull data across shope in warehouse to with natural key (itemid), now we can not identify record by Itemid now we would require tow field ItemID and Shope no (Natural key + data Source identifier ).
Now think if there is a transaction data (sales) to be pulled to establish relationship we have to join on two columns (Natural key + data Source identifier ) which will degrade performance .
second scenario you have to implement SCD then also you will required surrogate keys.
In nut shell surrogate keys is improve performance (read) and helps to implement [SCD][1]
Answer to question .
Just wanted to know whether surrogate keys need to be ordered explicitly on the natural key? no
Does the Same surrogate key always need to be binded with the natural key when a truncate and reload is used? Depends on SCD implementaion
or does it not matter, I have some big tables that use delta load which basically just does inserts and updates.
I don't want to be ordering the data to ensure surrogate key and natural key always bind if they don't need to? you will not able to relate data
Isnt that why they are nonsense keys? read

Name value pair table vs parent child

I want to store about 100k rows of data, and all data some common field.
All data have a category and other fields is base on category.
For example if data is in category 1, It had extrafield1 and extrafield2
I search and found two way for storing data.
1-Name value pair
Table1
ID Name Category Field2 Field3
1 Name1 1 Value Value
2 Name2 2 Value Value
Table2
ID Table1_ID Name Value
1 1 extrafield1 1
2 1 extrafield2 2
3 1 extrafield3 3
4 2 extrafield4 4
5 2 extrafield5 5
2-Parent Child table
Table1
ID Name Category Field2 Field3
1 Name1 1 Value Value
2 Name2 2 Value Value
Tableforcategory1
ID Table1_ID extrafield1 extrafield2 extrafield3
1 1 1 2 3
Tableforcategory2
ID Table1_ID extrafield4 extrafield5
1 2 4 5
So my question is when use method 1 and when use method 2.
Method 2 is generally preferred for a variety of reasons:
It more closely models the entities represented by the different categories.
It allows for the columns to have different data types.
It makes it easier to implement check constraints for value-only columns.
It makes it easier to implement foreign key constraints for reference columns.
It makes it easier to implement unique constraints, should these be appropriate.
It makes it easier to implement not-NULL and default values.
It makes it easier to add columns on specific attribute values.
And there may be other reasons.
The first method -- which is called entity-attribute-value modeling (EAV) -- is definitely an alternative. It is mostly suitable in two situations:
The number of attributes exceeds the column limit in the database being used.
The attributes are sparsely populated, so only a few are in use for any given entity.
Sometimes a hybrid of these two methods is appropriate, with commonly used attributes being stored in a relational format and sparse attributes stored as EAV.
There are alternative approaches, such as storing the values in a JSON or XML object. These are not generally recommended, but might be suitable in some databases under certain circumstances -- particularly when all attributes need to be treated as a single block and returned and set together.
It depends on the type of queries and the stability of the data model.
If your queries are essentially static, meaning you know when you are going to use "extrafield_x", then method 1 is simpler and more efficient, but less flexible.
If you need more dynamic queries and in time you might more categories and more "extrafields", method 1 is more flexible, no data model maintenance required, but more complex to use and probably slower.

SQL Server "pseudo/synthetic" composite Id(key)

Sorry but I don't know how to call in the Title what I need.
I want to create an unique key where each two digits of the number identify other table PK. Lets say I have below Pks in this 3 tables:
Id Company Id Area Id Role
1 Abc 1 HR 1 Assistant
2 Xyz 2 Financial 2 Manager
3 Qwe 3 Sales 3 VP
Now I need to insert values in other table, I know that I may do in 3 columns and create a Composite Key to reach integrity and uniqueness as below:
Id_Company Id_Area Id_Role ...Other_Columns.....
1 2 1
1 1 2
2 2 2
3 3 3
But I was thinking in create a single column where each X digites identify each FK. So the above table 3 first columns become like below (suposing each digit in an FK)
Id ...Other_Columns.....
121
112
222
333
I don't know how to call it and even if it's stupid but it makes sense for me, where I can select for a single column and in case of need some join I just need to split number each X digits by my definition.
It's called a "smart", "intelligent" or "concatenated" key. It's a bad idea. It is fragile, leads to update problems and impedes the DBMS. The DBMS and query language are designed for you to describe your application via base tables in a straightforward way. Use them as they were intended.

How to design table between three important columns (products,designs,colorImages) in SQL Server

I designed three tables
Products
Designs
Colorimages
What is the best way to design extra tables to join these three table together:
Each product have more than one design with more than colorimages of that design.
Example
productid designid colorimageid picturename
1 1 1 img1
1 1 2 img2
1 2 1 img3
1 2 3 img4
1 3 1 img5
2 1 1 img6
2 1 2 img7
2 2 1
2 3 3
2 3 4
How to design it with high performance ?
Create a table (called ProductDesignColor perhaps) with fields
ProductID
DesignID
ColorImageID
plus its own ID column as an IDENTITY column called ID
For high speed, keep the integer data types in the smallest sensible size, perhaps 32 bit integers.
Build a Clustered index on the ID of the table.
Build a Composite index on the 3 main ID fields (P, D, C), this index will be used by all of your queries no doubt.
Also make single indexes on the 3 ID field (P, D, C) - So 3 indexes in total - only if you will be querying using a single ID value.
Also, performance is a factor of your complete design, so your Product table should have ProductID as a Primary Key (Index). Same with the other tables... Indexes are the key to performance, but they have to be considered and used carefully. Or if you have the RAM - the more indexes the better (unless you are doing lots of inserts / updates).
So a good,neat table design also leads to well planned, minimal, powerful indexes that can fit neatly in the available RAM.