Is it good practice to have two SQL tables with bijective row correspondence? - sql

I have a table of tasks,
id | name
----+-------------
1 | brush teeth
2 | do laundry
and a table of states.
taskid | state
--------+-------------
1 | completed
2 | uncompleted
There is a bijective correspondence between the tables, i.e.
each row in the task table corresponds to exactly one row in the state table.
Another way of implementing this would be to place a state row in the task table.
id | name | state
----+-------------+-------------
1 | brush teeth | completed
2 | do laundry | uncompleted
The main reason why I have selected to use two tables instead of this one, is because updating the state will then cause a change in the task id.
I have other tables referencing the task(id) column, and do not want to have to update all those other tables too when altering a task's state.
I have two questions about this.
Is it good practice to have two tables in bijective row-row correspondence?
Is there a way I can ensure a constraint that there is exactly one row in the state table corresponding to each row in the task table?
The system I am using is postgresql.

You can ensure the 1-1 correspondence by making the id in each table a primary key and a foreign key that references the id in the other table. This is allowed and it guarantees 1-1'ness.
Sometimes, you want such tables, but one table has fewer rows than the other. This occurs when there is a subsetting relationship, and you don't want the additional columns on all rows.
Another purpose is to store separate columns in different places. When I learned about databases, this approach was called vertical partitioning. Nowadays, columnar databases are relatively common; these take the notion to the extreme -- a separate "store" for each column (although the "store" is not exactly a "table").
Why would you do this? Here are some reasons:
You have infrequently used columns that you do not want to load for every query on the more frequent columns.
You have frequently updated columns and you do not want to lock the rest of the columns.
You have too many columns to store in one row.
You have different security requirements on different columns.
Postgres does offer other mechanisms that you might find relevant. In particular, table inheritance might be useful in your situation.
All that said, you would not normally design a database like this. There are good reasons for doing so, but it is more typical to put all columns related to an entity in the same table.

Related

Turn two database tables into one?

I am having a bit of trouble when modelling a relational database to an inventory managament system. For now, it only has 3 simple tables:
Product
ID | Name | Price
Receivings
ID | Date | Quantity | Product_ID (FK)
Sales
ID | Date | Quantity | Product_ID (FK)
As Receivings and Sales are identical, I was considering a different approach:
Product
ID | Name | Price
Receivings_Sales (the name doesn't matter)
ID | Date | Quantity | Type | Product_ID (FK)
The column type would identify if it was receiving or sale.
Can anyone help me choose the best option, pointing out the advantages and disadvantages of either approach?
The first one seems reasonable because I am thinking in a ORM way.
Thanks!
Personally I prefer the first option, that is, separate tables for Sales and Receiving.
The two biggest disadvantage in option number 2 or merging two tables into one are:
1) Inflexibility
2) Unnecessary filtering when use
First on inflexibility. If your requirements expanded (or you just simply overlooked it) then you will have to break up your schema or you will end up with unnormalized tables. For example let's say your sales would now include the Sales Clerk/Person that did the sales transaction so obviously it has nothing to do with 'Receiving'. And what if you do Retail or Wholesale sales how would you accommodate that in your merged tables? How about discounts or promos? Now, I am identifying the obvious here. Now, let's go to Receiving. What if we want to tie up our receiving to our Purchase Order? Obviously, purchase order details like P.O. Number, P.O. Date, Supplier Name etc would not be under Sales but obviously related more to Receiving.
Secondly, on unnecessary filtering when use. If you have merged tables and you want only to use the Sales (or Receving) portion of the table then you have to filter out the Receiving portion either by your back-end or your front-end program. Whereas if it a separate table you have just to deal with one table at a time.
Additionally, you mentioned ORM, the first option would best fit to that endeavour because obviously an object or entity for that matter should be distinct from other entity/object.
If the tables really are and always will be identical (and I have my doubts), then name the unified table something more generic, like "InventoryTransaction", and then use negative numbers for one of the transaction types: probably sales, since that would correctly mark your inventory in terms of keeping track of stock on hand.
The fact that headings are the same is irrelevant. Seeking to use a single table because headings are the same is misconceived.
-- person [source] loves person [target]
LOVES(source,target)
-- person [source] hates person [target]
HATES(source,target)
Every base table has a corresponding predicate aka fill-in-the-[named-]blanks statement describing the application situation. A base table holds the rows that make a true statement.
Every query expression combines base table names via JOIN, UNION, SELECT, EXCEPT, WHERE condition, etc and has a corresponding predicate that combines base table predicates via (respectively) AND, OR, EXISTS, AND NOT, AND condition, etc. A query result holds the rows that make a true statement.
Such a set of predicate-satisfying rows is a relation. There is no other reason to put rows in a table.
(The other answers here address, as they must, proposals for and consequences of the predicate that your one table could have. But if you didn't propose the table because of its predicate, why did you propose it at all? The answer is, since not for the predicate, for no good reason.)

What should a relationships table look like - Need confirmation of my technique

Lets say I have 3 models:
User
Page
Comments
I asked a question based on if I should have each model keep track of its relationships: SQL relationships and best practices
an example of this would be a "Pages" table that states who its author was... The problem seemed to be that if 2 users were the author of the one page, you'd have to add a new specific table called PageRelationshipsWithUsers that might have a reference to the PageID and the UserID that created it and a separate row for the co-author.
Understandably this sounds a bit naff. I would end up with a heck load of relation tables and most likely, it could be replaced with just the one multi-purpose relationship table... So I decided to come up with a relationships table like the following:
Relationships Table New
RelationshipID | ItemID | LinkID | ItemType | LinkType | Status
-----------------------------------------------------------------------------
1 | 23(PageID) | 7(UserID) | ("Page") | ("User") | TRUE
2 | 22(CommentID) | 7(UserID) | ("Comment") | ("User") | TRUE
3 | 22(CommentID) | 23(PageID) | ("Comment") | ("Page") | TRUE
however, I would very much appreciate some input as to how good of an idea laying out my relationships table like this is.
Any thoughts?
Answer was told to me by a work colleague:
Imagine the above relationships table for the model "Book"
A User can Rent a book, so the relation is User -> Book...
But what if he can buy a book too: User->Book....
Ooops, we need a new relationship... and considering this relationship table was supposed to be the 1 size fits all, we now have a requirement to add a new separate table... whoops.
So the answer is NO NO NO. don't, it's naughty. Keep your relationship tables separate and specific.
Your suggestion for a relationship table is not optimal for several reasons:
It's difficult to write queries that join tables through the relationship table, as you will need filters on the ItemType and LinkType columns, which is not intuitive when writing queries.
If a need arises to add new entities in the future, that use different datatypes for their primary keys, you cannot easily store ID's of various datatypes in your ItemID and LinkID columns.
You cannot create explicit foreign keys in your database, to enforce referential integrity, which is possibly the best reason to avoid the design you suggest.
Query performance might suffer.
When normalizing a database, you should not be afraid to have many tables. Just make sure to use a naming convention that makes sense and is self-documenting. For example, you could name the relation table between authors and pages "PageAuthors", instead of "Pages".

SQL Server FK same table

I'm thinking of adding a relationship table to a database and I'd like to include a sort of reverse relation functionality by using a FK pointing to a PK within the same table. For example, Say I have table RELATIONSHIP with the following:
ID (PK) Relation ReverseID (FK)
1 Parent 2
2 Child 1
3 Grandparent 4
4 Grandchild 3
5 Sibling 5
First, is this even possible? Second, is this a good way to go about this? If not, what are your suggestions?
1) It is possible.
2) It may not be as desirable in your case as you might want - you have cycles, as opposed to an acyclic structure - because of this if your FK is in place you cannot insert any of those rows as they are. One possibility is that after allowing NULLs in your ReverseID column in your table DDL, you would have to INSERT all the rows with NULL ReverseID and then doing an UPDATE to set the ReverseID columns which will now have valid rows to reference. Another possibility is to disable the foregin key or don't create it until the data is in a completely valid state and then apply it.
3) You would have to do an operation like this almost every time, and if EVERY relationship has an inverse you either wouldn't be able to enforce NOT NULL in the schema or you would regularly be disabling and re-enabling constraints.
4) The sibling situation is the same.
I would be fine using the design if this is controlled in some way and you understand the implications.

How to bond N database table with one master-table?

Lets assume that I have N tables for N Bookstores. I have to keep data about books in separate tables for each bookstore, because each table has different scheme (number and types of columns is different), however there are same set of columns which is common for all Bookstores table;
Now I want to create one "MasterTable" with only few columns.
| MasterTable |
|id. | title| isbn|
| 1 | abc | 123 |
| MasterToBookstores |
|m_id | tb_id | p_id |
| 1 | 1 | 2 |
| 1 | 2 | 1 |
| BookStore_Foo |
|p_id| title| isbn| date | size|
| 1 | xyz | 456 | 1998 | 3KB |
| 2 | abc | 123 | 2003 | 4KB |
| BookStore_Bar |
|p_id| title| isbn| publisher | Format |
| 1 | abc | 123 | H&K | PDF |
| 2 | mnh | 986 | Amazon | MOBI |
My question, is it right to keep data in such way? What are best-practise about this and similar cases? Can I give particular Bookstore table an aliase with number, which will help me manage whole set of tables?
Is there a better way of doing such thing?
I think you are confusing the concepts of "store" and "book".
From you comments and the example data, it appears the problem is in having different sets of attributes for books, not stores. If so, you'll need a structure similar to this:
The symbol: denotes inheritance1. The BOOK is the "base class" and BOOK1/BOOK2/BOOK3 are various "subclasses"2. This is a common strategy when entities share a set of attributes or relationships3. For the fuller explanation of this concept, please search for "Subtype Relationships" in the ERwin Methods Guide.
Unfortunately, inheritance is not directly supported by current relational databases, so you'll need to transform this hierarchy into plain tables. There are generally 3 strategies for doing so, as described in these posts:
Interpreting ER diagram
Parent and Child tables - ensuring children are complete
Supertype-subtype database design
NOTE: The structure above allows various book types to be mixed inside the same bookstore. Let me know if that's not desirable (i.e. you need exactly one type of books in any given bookstore)...
1 Aka. category, subclassing, subtyping, generalization hierarchy etc.
2 I.e. types of books, depending on which attributes they require.
3 In this case, books of all types are in the many-to-many relationship with stores.
If you had at least two columns which all other tables use it then you could have base table for all books and add more tables for the rest of the data using the id from Base table.
UPDATE:
If you use entity framework to connect to your DB I suggest you to try this:
Create your entities model something like this:
then let entity framework generate the database(Update database from Model) for you. Note this uses inheritance(not in database).
Let me know if you have questions.
Suggest data model:
1. Have a master database, which saves master data
2. The dimension tables in master database, transtional replicated to your distributed bookstore database
3. You can choose to use updatable scriscriber or merge replication is also a good choice
4. Each distributed bookstore database still work independently, however master data either merge back by merge replication or updatable subscriber.
5. If you want to make sure master data integrity, you can only read-only subscriber, and use transational replication to distribute master data into distributed database, but in this design, you need to have store proceduces in master database to register your dimension data. Make sure there is no double-hop issue.
I would suggest you to have two tables:
bookStores:
id name someMoreColumns
books:
id bookStore_id title isbn date publisher format size someMoreColumns
It's easy to see the relationship here: a bookStore has many books.
Pay attention that I'm putting all the columns you have in all of your BookStore tables in just one table, even if some row from some table does not have a value to some column.
Why I prefer this way:
1) To all the data from BookStore tables, just few columns will never have a value on table books (as example, size and format if you don't have an e-book version). The other columns can be filled someday (you can set a date to your e-books, but you don't have this column on your table BookStore_Bar, which seems to refer to the e-books). This way you can have much more detailed infos from all your books if someday you want to update it.
2) If you have a bunch of tables BookStore, lets say 12, you will not be able to handle your data easily. What I say is, if you want to run some query to all your books (which means to all your tables), you will have at least three ways:
First: run manually the query to each of the 12 tables and so merge the data;
Second: write a query with 12 joins or set 12 tables on your FROM clause to query all your data;
Third: be dependent of some script, stored procedure or software to do for you the first or the second way I just said;
I like to be able to work with my data as easy as possible and with no dependence of some other script or software, unless I really need it.
3) As of MySQL (because I know much more of MySQL) you can use partitions on your table books. It is a high level of data management in which you can distribute the data from your table to several files on your disk instead of just one, as generally a table is allocated. It is very useful when handling a large ammount of data in a same table and it speeds up queries based on your data distribution plan. Lets see an example:
Lets say you already have 12 distinct bookStores, but under my database model. For each row in your table books you'll have an association to one of the 12 bookStore. If you partition your data over the bookStore_id it will be almost the same as you had 12 tables, because you can create a partition for each bookStore_id and so each partition will handle only the related data (the data that match the bookStore_id).
Lets say you want to query the table books to the bookStore_id in (1, 4, 9). If your query really just need of these three partitions to give you the desired output, then the others will not be queried and it will be as fast as you were querying each separated table.
You can drop a partition and the other will not be affected. You can add new partitions to handle new bookStores. You can subpartition a partition. You can merge two partitions. In a nutshell, you can turn your single table books in an easy-to-handle, multi-storage table.
Side Effects:
1) I don't know all of table partitioning, so it's good to refer to the documentation to learn all important points to create and manage it.
2) Take care of data with regular backups (dumps) as you probably may have a very populated table books.
I hope it helps you!

redundant column

I have a database that has two tables, these tables look like this
codes
id | code | member_id
1 | 123 | 2
2 | 234 | 1
3 | 345 |
4 | 456 | 3
members
id | code_id | other info
1 | 2 | blabla
2 | 1 | blabla
3 | 4 | blabla
the basic idea is that if a code is taken then its member id field is filled in, however this is creating a circle link (members points to codes, codes points to members) is there a different way of doing this? is this actually a bad thing?
Update
To answer your questions there are three different code tables with approx 3.5 million codes each, each table is searched depending on different criteria, if the member_id column is empty then the code is unclaimed, else, the code is claimed, this is done so that when we are searching the database we do not need to include another table to tell if it it claimed.
the members table contains the claimants for every single code, so all 10.5 million members
the additional info has things like mobile, flybuys.
the mobile is how we identify the member, but each entry is considered a different member.
It's a bad thing because you can end up with anomalies. For example:
codes
id | code | member_id
1 | 123 | 2
members
id | code_id | other info
2 | 4 | blabla
See the anomaly? Code 1 references its corresponding member, but that member doesn't reference the same code in return. The problem with anomalies is you can't tell which one is the correct, intended reference and which one is a mistake.
Eliminating redundant columns reduces the chance for anomalies. This is a simple process that follows a few very well defined rules, called rules of normalization.
In your example, I would drop the codes.member_id column. I infer that a member must reference a code, but a code does not necessarily reference a member. So I would make members.code_id reference codes.id. But it could go the other way; you don't give enough information for the reader to be sure (as #OMG Ponies commented).
Yeah, this is not good because it presents opportunities for data integrity problems. You've got a one-to-one relationship, so either remove Code_id from the members table, or member_id from the codes table. (in this case it seems like it would make more sense to drop code_id from members since it sounds like you're more frequently going to be querying codes to see which are not assigned than querying members to see which have no code, but you can make that call)
You could simply drop the member_id column and use a foreign key relationship (or its absence) to signify the relationship or lack thereof. The code_id column would then be used as a foreign key to the code. Personally, I do think it's bad simply because it makes it more work to ensure that you don't have corrupt relationships in the DB -- i.e., you have to check that the two columns are synchronized between the tables -- and it doesn't really add anything in the general case. If you are running into performance problems, then you may need to denormalize, but I'd wait until it was definitely a problem (and you'd likely replicate more than just the id in that case).
It depends on what you're doing. If each member always gets exactly one unique code then just put the actual code in the member table.
If there are a set of codes and several members share a code (but each member still has just one) then remove the member_id from the codes table and only store the unique codes. Access a specific code through a member. (you can still join the code table to search on codes)
If a member can have multiple codes then remove the code_id from the member table and the member_id from the code table can create a third table that relates members to codes. Each record in the member table should be a unique record and each record in the code table should be a unique record.
What is the logic behind having the member code in the code table?
It's unnecessary since you can always just do a join if you need both pieces of information.
By having it there you create the potential for integrity issues since you need to update BOTH tables whenever an update is made.
Yes this is a bad idea. Never set up a database to have circular references if you can help it. Now any change has to be made both places and if one place is missed, you have a severe data integrity problem.
First question can each code be assigned to more than one member? Or can each member have more than one code? (this includes over time as well as at any one moment if you need historical records of who had what code when))If the answer to either is yes, then your current structure cannot work. If the answer to both is no, why do you need two tables?
If you can have mulitple codes and multiple members you need a bridging table that has memberid and code id. If you can have multiple members assigned one code, put the code id in the members table. If it is the other way it should be the memberid in the code table. Then properly set up the foreign key relationship.
#Bill Karwin correctly identifies this as a probably design flaw which will lead to anomalies.
Assuming code and member are distinct entities, I would create a thrid table...
What is the relationship between a code and member called? An oath? If this is a real life relationship, someone with domain knowledge in the business will be able to give it a name. If not look for further design flaws:
oaths
code_id | member_id
1 | 2
2 | 1
4 | 3
The data suggest that a unique constraint is required for (code_id, member_id).
Once the data is 'scrubbed', drop the columns codes.member_id and members.code_id.