Database design about list of constant strings - sql

For database design, if the value of a column is from a constant list of strings, such as status, type. Should I create a new table and have a foreign key or just store plain strings in the same table.
For example, I have a orders table with status:
----------------------------
| id | price | status |
----------------------------
| 1 | 10.00 | pending |
| 2 | 03.00 | in_progress |
| 3 | xx.xx | done |
An alternative for above table is to have a order_status table and store status_id in orders table. I'm not sure if another table is necessary here.

If it's more than just a few different values and/or values are frequently added you should go with a normalized data model, i.e. a table.
Otherwise you also might go for a column, but you need to add a CHECK(status in ('pending','in_progress,'done')) to avoid wrong data. This way you get the same consistency without the FK.
To save space you might use abbreviations (one or a few characters, e.g. 'p', 'i', 'd') but not meaningless numbers(1,2,3). Resolving the long values can be done in a View level using CASE.
ENUMs are proprietary, so IMHO better avoid it...

It's not a good practice to create a table just for static values.
Instead, you could use the ENUM type, which has a pre set value, as the example:
CREATE TABLE orders (
id INT,
price DOUBLE,
status ENUM('pending', 'in progress', 'done')
);

There are pros and cons for each solution and you need pick the best for your own project and you may have to switch later if the initial choice is bad.
In your case, storing status directly can be good enough. But if you want to prevent invalid status stored in your database, or you have a very long status text, you may want to store them separately with a foreign key constraint.
ENUM is another solution. However, if you need a new status later, you have to change your table definition, which can be a very bad thing.

If the status has extra data associated with it, like display order or a colour, then you would need a separate table. Also, choosing pre-entered values from a table prevents semi-duplicate values (for example, one person might write "in progress" whereas another might write "in_progress" or "progressing") and aids in searching for orders with the same status.
I would go for a separate table as it allows more capabilities and lowers error.

I would use an order_status table with the literal as the primary key. Then in your orders table, cascade updates on the status column in case you modify the literals in the order_status table. This way you have data consistency and avoid join queries.

Related

How best to organise data that is dependent on a field in PostgreSQL?

I am using Postgresql with graphql to deal with some data for posts on a website. The posts are slightly complex and I am just not sure what the best way to organise the data would be. The reason for this is because the Posts can be of different types, and as a result the data structures for the different types would vary. For example I have a text type and a video type where the video type would need a url, thumbnail and a description and the text would just need a description. I'm not sure what the best way to go about organising the data would be.
Would it be possible to somehow link multiple tables like below? So in other words would it be possible to query and get the right data depending on the ItemID in the posts table and then pull it out from the corresponding table.
The table of posts would look like:
|-----------|----------|----------|----------|
| ID | Type | Title | ItemID |
|-----------|----------|----------|----------|
| 1 | Text | My text | 1234 |
|-----------|----------|----------|----------|
| 2 | Video | A vid | 3456 |
|-----------|----------|----------|----------|
The Video table would look like
|-----------|----------|----------|----------|
| ID | Desc | URL. | Thumbnail|
|-----------|----------|----------|----------|
| 3456 | .... | .... | .... |
|-----------|----------|----------|----------|
The Text table would look like
|-----------|----------|
| ID | Desc |
|-----------|----------|
| 1234 | .... |
The other way I can think of is just having a single table for the posts and just have the data as not required and have the possibility of all the fields. I will then just organise it in the front end as required.
Any thoughts or recommendation on how to best deal with this would be greatly appreciated!
I'm not so sure about your design, but then I don't know the total spec.
You're going to have to use some code alongside your database design. To start with, a web page table, I'm also presuming your website has more than one page with posts.
WEBPAGE Table: WebPageID (PK), WebPageDesc
Then, we add a table for posts, for each webpage.
This assigns an ID to each post.
POST Table: PostID (PK), WebPageID (FK)
If you only have one page, then you obvs don't need the WEBPAGE table and you can alter the POST table to; just POST Table: PostID (PK)
Then, as you've worked out, a table to store all video posts, and another for text.
VIDEOPOSTS Table: VideoPostID (PK), VideoDesc, VideoURL
TEXTPOSTS Table: TextPostID (PK), TextDesc
Now you just need a couple of tables to link the different types of posts back to the webpage. Composite primary key in both, saves the need for an ID column.
POSTSVIDEOPOSTS Table: PostID (PK) (FK to post table), VideoPostID (PK) (FK to videoposts table)
POSTSTEXTPOSTS Table: PostID (PK) (FK to post table), TextPostID (FK) (FK to textposts table)
Further thinking:
You can add a deleted flag next to each post, to decided whether or not they should be included in the web page output or not, depending whether a user can delete posts or not.
With the above design, it's going to be difficult to sort the posts into the correct order, as you are retrieving them from two separate tables. This design is normalised tho, I think.
I'm just not sure it's 100% correct. I'd need to see more of the spec to be able to design it better. Perhaps if there is a datetime the post was made, that would make sorting easier?
But then I'm not 100% confident the original design in your original post using a Type column in the post table is correct either...
Does the design need to be normalised?
Happy for the above to be criticised by others, I'm pretty sure it needs further work!
Your attempted check is on the right track, but not quite. First off the structure "type IS 'video'" is invalid. Null values are checked with IS, but other values are checked with comparison operators. Secondly, as Laurenze indicates, you must ensure that one and only one of the foreign key identifiers is null. With these conditions we get:
check ( ( (vedioidentifier is null and textidentifier is not null)
or (vedioidentifier is not null and textidentifier is null)
)
and ( (type = 'video' and vedioidentifier is not null)
or (type = 'text' and textidentifier is not null)
)
)
Caution against CamelCase: Postgres maintains CamelCase names for identifies only if they are also enclosed in double quotes ("). If you do so you most ALWAYS use double quotes for every reference. If not double quoted Postgres folds all identifiers to lower case. A much better approach is to use snake_case identifier names.
I see nothing wrong with your data model, except that I would use two different columns instead of itemid. Each of them has a foreign key, one to the video table and one to the text table. A check constraint makes sure that only one of them IS NOT NULL and that type matches, something like
CHECK (type = 'text' AND text_id IS NOT NULL AND video_id IS NULL OR
type = 'video' AND text_id IS NULL AND video_id IS NOT NULL)
This way you can have foreign keys between the tables, which is essential for data integrity.

Turn two database tables into one?

I am having a bit of trouble when modelling a relational database to an inventory managament system. For now, it only has 3 simple tables:
Product
ID | Name | Price
Receivings
ID | Date | Quantity | Product_ID (FK)
Sales
ID | Date | Quantity | Product_ID (FK)
As Receivings and Sales are identical, I was considering a different approach:
Product
ID | Name | Price
Receivings_Sales (the name doesn't matter)
ID | Date | Quantity | Type | Product_ID (FK)
The column type would identify if it was receiving or sale.
Can anyone help me choose the best option, pointing out the advantages and disadvantages of either approach?
The first one seems reasonable because I am thinking in a ORM way.
Thanks!
Personally I prefer the first option, that is, separate tables for Sales and Receiving.
The two biggest disadvantage in option number 2 or merging two tables into one are:
1) Inflexibility
2) Unnecessary filtering when use
First on inflexibility. If your requirements expanded (or you just simply overlooked it) then you will have to break up your schema or you will end up with unnormalized tables. For example let's say your sales would now include the Sales Clerk/Person that did the sales transaction so obviously it has nothing to do with 'Receiving'. And what if you do Retail or Wholesale sales how would you accommodate that in your merged tables? How about discounts or promos? Now, I am identifying the obvious here. Now, let's go to Receiving. What if we want to tie up our receiving to our Purchase Order? Obviously, purchase order details like P.O. Number, P.O. Date, Supplier Name etc would not be under Sales but obviously related more to Receiving.
Secondly, on unnecessary filtering when use. If you have merged tables and you want only to use the Sales (or Receving) portion of the table then you have to filter out the Receiving portion either by your back-end or your front-end program. Whereas if it a separate table you have just to deal with one table at a time.
Additionally, you mentioned ORM, the first option would best fit to that endeavour because obviously an object or entity for that matter should be distinct from other entity/object.
If the tables really are and always will be identical (and I have my doubts), then name the unified table something more generic, like "InventoryTransaction", and then use negative numbers for one of the transaction types: probably sales, since that would correctly mark your inventory in terms of keeping track of stock on hand.
The fact that headings are the same is irrelevant. Seeking to use a single table because headings are the same is misconceived.
-- person [source] loves person [target]
LOVES(source,target)
-- person [source] hates person [target]
HATES(source,target)
Every base table has a corresponding predicate aka fill-in-the-[named-]blanks statement describing the application situation. A base table holds the rows that make a true statement.
Every query expression combines base table names via JOIN, UNION, SELECT, EXCEPT, WHERE condition, etc and has a corresponding predicate that combines base table predicates via (respectively) AND, OR, EXISTS, AND NOT, AND condition, etc. A query result holds the rows that make a true statement.
Such a set of predicate-satisfying rows is a relation. There is no other reason to put rows in a table.
(The other answers here address, as they must, proposals for and consequences of the predicate that your one table could have. But if you didn't propose the table because of its predicate, why did you propose it at all? The answer is, since not for the predicate, for no good reason.)

redundant column

I have a database that has two tables, these tables look like this
codes
id | code | member_id
1 | 123 | 2
2 | 234 | 1
3 | 345 |
4 | 456 | 3
members
id | code_id | other info
1 | 2 | blabla
2 | 1 | blabla
3 | 4 | blabla
the basic idea is that if a code is taken then its member id field is filled in, however this is creating a circle link (members points to codes, codes points to members) is there a different way of doing this? is this actually a bad thing?
Update
To answer your questions there are three different code tables with approx 3.5 million codes each, each table is searched depending on different criteria, if the member_id column is empty then the code is unclaimed, else, the code is claimed, this is done so that when we are searching the database we do not need to include another table to tell if it it claimed.
the members table contains the claimants for every single code, so all 10.5 million members
the additional info has things like mobile, flybuys.
the mobile is how we identify the member, but each entry is considered a different member.
It's a bad thing because you can end up with anomalies. For example:
codes
id | code | member_id
1 | 123 | 2
members
id | code_id | other info
2 | 4 | blabla
See the anomaly? Code 1 references its corresponding member, but that member doesn't reference the same code in return. The problem with anomalies is you can't tell which one is the correct, intended reference and which one is a mistake.
Eliminating redundant columns reduces the chance for anomalies. This is a simple process that follows a few very well defined rules, called rules of normalization.
In your example, I would drop the codes.member_id column. I infer that a member must reference a code, but a code does not necessarily reference a member. So I would make members.code_id reference codes.id. But it could go the other way; you don't give enough information for the reader to be sure (as #OMG Ponies commented).
Yeah, this is not good because it presents opportunities for data integrity problems. You've got a one-to-one relationship, so either remove Code_id from the members table, or member_id from the codes table. (in this case it seems like it would make more sense to drop code_id from members since it sounds like you're more frequently going to be querying codes to see which are not assigned than querying members to see which have no code, but you can make that call)
You could simply drop the member_id column and use a foreign key relationship (or its absence) to signify the relationship or lack thereof. The code_id column would then be used as a foreign key to the code. Personally, I do think it's bad simply because it makes it more work to ensure that you don't have corrupt relationships in the DB -- i.e., you have to check that the two columns are synchronized between the tables -- and it doesn't really add anything in the general case. If you are running into performance problems, then you may need to denormalize, but I'd wait until it was definitely a problem (and you'd likely replicate more than just the id in that case).
It depends on what you're doing. If each member always gets exactly one unique code then just put the actual code in the member table.
If there are a set of codes and several members share a code (but each member still has just one) then remove the member_id from the codes table and only store the unique codes. Access a specific code through a member. (you can still join the code table to search on codes)
If a member can have multiple codes then remove the code_id from the member table and the member_id from the code table can create a third table that relates members to codes. Each record in the member table should be a unique record and each record in the code table should be a unique record.
What is the logic behind having the member code in the code table?
It's unnecessary since you can always just do a join if you need both pieces of information.
By having it there you create the potential for integrity issues since you need to update BOTH tables whenever an update is made.
Yes this is a bad idea. Never set up a database to have circular references if you can help it. Now any change has to be made both places and if one place is missed, you have a severe data integrity problem.
First question can each code be assigned to more than one member? Or can each member have more than one code? (this includes over time as well as at any one moment if you need historical records of who had what code when))If the answer to either is yes, then your current structure cannot work. If the answer to both is no, why do you need two tables?
If you can have mulitple codes and multiple members you need a bridging table that has memberid and code id. If you can have multiple members assigned one code, put the code id in the members table. If it is the other way it should be the memberid in the code table. Then properly set up the foreign key relationship.
#Bill Karwin correctly identifies this as a probably design flaw which will lead to anomalies.
Assuming code and member are distinct entities, I would create a thrid table...
What is the relationship between a code and member called? An oath? If this is a real life relationship, someone with domain knowledge in the business will be able to give it a name. If not look for further design flaws:
oaths
code_id | member_id
1 | 2
2 | 1
4 | 3
The data suggest that a unique constraint is required for (code_id, member_id).
Once the data is 'scrubbed', drop the columns codes.member_id and members.code_id.

Adding a unique constraint on calculated value of a column

I'm not exactly sure how to phrase this, but here goes...
We have a table structure like the following:
Id | Timestamp | Type | Clientid | ..others..
001 | 1234567890 | TYPE1 | CL1234567 |.....
002 | 1234561890 | TYPE1 | CL1234567 |.....
Now for the data given above... I would like to have a constraint so that those 2 rows could not exist together. Essentially, I want the table to be
Unique for (Type, ClientId, CEIL(Timestamp/10000)*10000)
I don't want rows with the same data created within X time of each other to be added to the db, i.e would like a constraint violation in this case. The problem is that, the above constraint is not something I can actually create.
Before you ask, I know, I know.... why right? Well I know a certain scenario should not be happening, but alas it is. I need a sort of stop gap measure for now, so I can buy some time to investigate the actual matter. Let me know if you need additional info...
Yes, Oracle supports calculated columns:
SQL> alter table test add calc_column as (trunc(timestamp/10000));
Table altered.
SQL> alter table test
add constraint test_uniq
unique (type, clientid, calc_column);
Table altered.
should do what you want.
AFAIK, Oracle does not support computed columns like SQL Server does. You can mimic the functionality of a computed column using Triggers.
Here are the steps for this
Add a column called CEILCalculation to your table.
On your table, put a trigger will update CEILCalculation with the value from CEIL(Timestamp/10000)*10000
Create a Unique Index on the three columns (Unique for (Type, ClientId, CEILCalculation)
If you do not want to modify the table structure, you can put a BEFORE INSERT TRIGGER on the table and check for validity over there.
http://www.techonthenet.com/oracle/triggers/before_insert.php

Change all primary keys in access table to new numbers

I have an access table with an automatic primary key, a date, and other data. The first record starts at 36, due to deleted records. I want to change all the primary keys so they begin at 1 and increment, ordered by the date. Whats the best way to do this?
I want to change the table from this:
| TestID | Date | Data |
| 36 | 12/02/09 | .54 |
| 37 | 12/04/09 | .52 |
To this:
| TestID | Date | Data |
| 1 | 12/02/09 | .54 |
| 2 | 12/04/09 | .52 |
EDIT: Thanks for the input and those who answered. I think some were reading a little too much into my question, which is okay because it still adds to my learning and thinking process. The purpose of my question was two fold: 1) It would simply be nicer for me to have the PK match with the order of my data's dates and 2) to learn if something like this was possible for later use. Such as, if I want to add a new column to the table which numbers the tests, labels the type of test, etc. I am trying to learn a lot at once right now so I get a little confused where to start sometimes. I am building .NET apps and trying to learn SQL and database management and it is sometimes confusing finding the right info with the different RDMS's and ways to interact with them.
Following from MikeW, you can use the following SQL command to copy the data from the old to the new table:
INSERT
TestID, Date, Data
INTO
NewTable
SELECT
TestID, Date, Data
FROM
OldTable;
The new TestID will start from 1 if you use an AutoIncrement field.
I would create a new table, with autoincrement.
Then select all the existing data into it, ordering by date. That will result in the IDs being recreated from "1".
Then you could drop the original table, and rename the new one.
Assuming no foreign keys - if so you'd have to drop and recreate those too.
An Autonumber used as a surrogate primary keys is not data, but metadata used to do nothing but connect records in related tables. If you need to control the values in that field, then it's data, and you can't use an Autonumber, but have to roll your own autoincrement routine. You might want to look at this thread for a starting point, but code for this for use in Access is available everywhere Access programmers congregate on the Net.
I agree that the value of the auto-generated IDENTITY values should have no meaning, even for the coder, but for education purposes, here's how to reseed the IDENTITY using ADO:
ACC2000: Cannot Change Default Seed and Increment Value in UI
Note the article as out of date because it says, "there are no options available in the user interface (UI) for you to make this change." In later version the Access, the SQL DLL could be executed when in ANSI-92 Query Mode e.g. something like this:
ALTER TABLE MyTable ALTER TestID INTEGER IDENTITY (1, 1) NOT NULL;