Just thinking about database design issues. Suppose i have a table like this:
CREATE TABLE LEGACYD.CV_PLSQL_COUNT
(
RUN_DTTM DATE NOT NULL,
TABLE_NAME VARCHAR (80) NOT NULL,
COUNT Number(50) PRIMARY KEY
);
Is that a bad idea - to make the COUNT a PRIMARY KEY ? Generally, how should I make the decision of what is a primry key?
Is that a bad idea - to make the COUNT a PRIMARY KEY ? Generally, how should I make the decision of what is a primry key?
Candidate keys are based on functional dependencies. A primary key is one of the candidate keys. There's no formal way to look at a set of candidate keys and say, "This one must be the primary key." That decision is based on practical matters, not on formal logic.
Your table structure tells us these things.
COUNT is unique. No matter how many rows this table has, you'll never find two rows that have the same value for COUNT.
COUNT determines TABLE_NAME. That is, given a value for COUNT, we will forever find one and only one value for TABLE_NAME.
TABLE_NAME is not unique. We expect to find several rows that have the same value for TABLE_NAME.
COUNT determines RUN_DTTM. That is, given a value for COUNT, we will forever find one and only one value for RUN_DTTM.
RUN_DTTM is not unique. We expect to find several rows that have the same value for RUN_DTTM.
The combination of TABLE_NAME and RUN_DTTM is not unique. We expect to find several rows that have the same values for TABLE_NAME and RUN_DTTM in a single row.
There are no other determinants. That means that given a value for TABLE_NAME, we'll find multiple unrelated values for COUNT and RUN_DTTM. Likewise, if we're given a value for RUN_DTTM, or values for the pair of columns {TABLE_NAME, RUN_DTTM}.
If all those things are true, then COUNT might be a good primary key. But I doubt that all those things are true.
Based only on the column names--a risky way to proceed--I think it's far more likely that the only candidate key is {TABLE_NAME, RUN_DTTM}. I think it's also likely that either RUN_DTTM is misnamed, or RUN_DTTM has the wrong data type. If it's a date, you probably meant to name it RUN_DT; if it's a timestamp, the data type should be TIMESTAMP.
Related
Help! I have a posgres table like:
create table events(
event_id bigserial primary key,
reason text not null,
lifecycle_id bigint not null,
...
);
I’ve got two kinds of values in the same column. lifecycle_id bigint not null is either populated with an identifier for another resource (transaction_id) or a randomly generated number if the transaction_id is not present. This number is not unique and is used as a way of connecting multiple rows. I.e: there can be several rows with the same lifecycle_id and we will want to query it.
Obviously this is not great because it'll inevitably lead to collusions. So I’m introducing a nullable field to the table called transaction_id and actually populating it. I will also start generating the future values for the lifecycle field. But this isn’t enough for me. I’d like to find a way to migrate the lifecycle field into something less “hacky”. Either a sequence or a serial field.
What would you do in my shoes to not have to maintain collision-management code?
I am a newbie at SQL/PostgreSQL, and I had a conceptual question about foreign keys and keys in general:
Let's say I have two tables: Table A and Table B.
A has a bunch of columns, two of which are A.id, A.seq. The primary key is btree(A.id, A.seq) and it has a foreign key constraint that A.id references B.id. Note that A.seq is a sequential column (first row has value 1, second has 2, etc).
Now say B has a bunch of columns, one of which is the above mentioned B.id. The primary key is btree(B.id).
I have the following questions:
What exactly does btree do? What is the significance of having two column names in the btree rather than just one (as in btree(B.id)).
Why is it important that A references B instead of B referencing A? Why does order matter when it comes to foreign keys??
Thanks so much! Please correct me if I used incorrect terminology for anything.
EDIT: I am using postgres
A btree index stored values in sorted order, which means you can not only search for a single primary key value, but you can also efficiently search for a range of values:
SELECT ... WHERE id between 6060842 AND 8675309
PostgreSQL also supports other index types, but only btree is supported for a unique index (for example, the primary key).
In your B table, the primary key being a single column id means that only one row can exist for each value in id. In other words, it is unique, and if you search for a value by primary key, it will find at most one row (it may also find zero rows if you don't have a row with that value).
The primary key in your A table is for (id, seq). This means you can have multiple rows for each value of id. You can also have multiple rows for each value of seq as long as they are for different id values. The combination must be unique though. You can't have more than one row with the same pair of values.
When a foreign key in A references B, it means that the row must exist in B before you are allowed to store the row in A with the same id value. But the reverse is not necessary.
Example:
Suppose B is for users and A is for phones. You must store a user before you can store a phone record for that user. You can store one or more phones for that user, one per row in the A table. We say that a row in A therefore references the user row in B, meaning, "this phone belongs to user #1234."
But the reverse is not restricted. A user might have no phones (at least not known to this database), so there is no requirement for B to reference A. In other words, it is not required for a user to have a phone. You can store a row in B (the user) even if there is no row in A (the phones) for that user.
The reference also means you are not allowed to DELETE FROM B WHERE id = ? if there are rows in another table that reference that given row in B. Deleting that user would cause those other rows to become orphaned. No one would be able to know who those phones belonged to, if the user row they reference is deleted.
To your questions:
There are several strategies to implement unique keys. The most common one is to use an index that is "unique" using a "b-tree" strategy. That's what "btree" means in PostgreSQL.
Having two columns in a key just depends on how you want to design your table. When you have a key with more than one column that is called a "composite key".
When A references B, the columns in B must represent a "key". The columns in A do not represent a key, but just a reference to one. In fact the values in A for that column can be repeated; that is, multiple rows in A can point to the same row in B.
Your data structure makes no sense. Why would the primary key of A haver both id and name? Normally it would just be id. In some data models, you might have a version or timestamp added. I can't think of a reasonable data model where name would also be included.
In addition, B's foreign key would have to be to both id and name.
But, your question is what is btree for? Most databases don't have such an option. A primary key would typically be expressed as:
id int primary key;
constraint unq_t_id primary key (id);
btree is a type of index -- in fact the default type of index in all databases that I'm aware of. Databases that have a plethora of available types of indexes -- such as Postgres -- you can specify the index type associated with the primary key.
Sometimes, there are certain tables in an application with only one column in each of them. Data of records within the respective columns are unique. Examples are: a table for country names, a table for product names (up to 60 characters long, say), a table for company codes (3 characters long and determined by the user), a table for address types (say, billing, delivery), etc.
For tables like these, as the records are unique and not null, the only column can be used as the primary key, technically speaking.
So my question is, is it good enough to use that column as the primary key for the table? Or, is it still desirable to add another column (country_id, product_id, company_id, addresstype_id) as the primary key for the table? Why?
Thanks in advance for any advice.
there is always a debate between using surrogate keys and composite keys as primary key. using composite primary keys always introduces some complexity to your database design so to your application.
think that you have another table which is needed to have direct relationship between your resulting table (billing table). For the composite key scenario you need to have 4 columns in your related table in order to connect with the billing table. On the other hand, if you use surrogate keys, you will have one identity column (simplicity) and you can create unique constraint on (country_id, product_id, company_id, addresstype_id)
but it is hard to say this approach is better then the other one because they both have Pros and Cons.
You can check This for more information
This may seem like a simple question, but I am stumped:
I have created a database about cars (in Oracle SQL developer). I have amongst other tables a table called: Manufacturer and a table called Parentcompany.
Since some manufacturers are owned by bigger corporations, I will also show them in my database.
The parentcompany table is the "parent table" and the Manufacturer table the "child table".
for both I have created columns, each having their own Primary Key.
For some reason, when I inserted the values for my columns, I was able to use the same value for the primary key of Manufacturer and Parentcompany
The column: ManufacturerID is primary Key of Manufacturer. The value for this is: 'MBE'
The column: ParentcompanyID is primary key of Parentcompany. The value for this is 'MBE'
Both have the same value. Do I have a problem with the thinking logic?
Or do I just not understand how primary keys work?
Does a primary key only need to be unique in a table, and not the database?
I would appreciate it if someone shed light on the situation.
A primary key is unique for each table.
Have a look at this tutorial: SQL - Primary key
A primary key is a field in a table which uniquely identifies each
row/record in a database table. Primary keys must contain unique
values. A primary key column cannot have NULL values.
A table can have only one primary key, which may consist of single or
multiple fields. When multiple fields are used as a primary key, they
are called a composite key.
If a table has a primary key defined on any field(s), then you cannot
have two records having the same value of that field(s).
Primary key is table-unique. You can use same value of PI for every separate table in DB. Actually that often happens as PI often incremental number representing ID of a row: 1,2,3,4...
For your case more common implementation would be to have hierarchical table called Company, which would have fields: company_name and parent_company_name. In case company has a parent, in field parent_company_name it would have some value from field company_name.
There are several reasons why the same value in two different PKs might work out with no problems. In your case, it seems to flow naturally from the semantics of the data.
A row in the Manufacturers table and a row in the ParentCompany table both appear to refer to the same thing, namely a company. In that case, giving a company the same id in both tables is not only possible, but actually useful. It represents a 1 to 1 correspondence between manufacturers and parent companies without adding extra columns to serve as FKs.
Thanks for the quick answers!
I think I know what to do now. I will create a general company table, in which all companies will be stored. Then I will create, as I go along specific company tables like Manufacturer and parent company that reference a certain company in the company table.
To clarify, the only column I would put into the sub-company tables is a column with a foreign key referencing a column of the company table, yes?
For the primary key, I was just confused, because I hear so much about the key needing to be unique, and can't have the same value as another. So then this condition only goes for tables, not the whole database. Thanks for the clarification!
I have a table with 8 columns and would like to know the smallest possible unique key for this table. (there are no indexes).
This is as distinct as it gets:
select count(*)
from (select distinct id1, id2, id3,
id4, id5, id6,
id7, id8
from mytable);
Is there a quick and easy way to figure out what column combination is unique for this table? How to proceed?
No, there is no quick and easy way as people have said; as Barmar commented:
Any answer you get will be dependent on the data that happens to be loaded at the time. Any change made to the table could invalidate the result
Additionally you have not previously had a primary key searching for uniqueness could easily come up with a false positive. You haven't been enforcing uniqueness so the set of columns that should be unique might not be.
A unique key is determined by the data within your table. You need to understand the data in order to determine what should be unique.
There should be a natural way of having a unique key, i.e. if you've got a table of OS users then the unique key should probably be their username. Equally, if you have a table of employees there's no way to guarantee uniqueness across employee name (or any other attribute or set of attributes) so you will need to create a surrogate key.
Only you can determine this; study your data and you should be able to work it out.
Generally:
Every table should have a primary key; should should be created at the same time as the table.
There is nothing more important than understanding the data within your database. You are not able to effectively use the database until you understand what it is that you're looking at.