I am trying to model a new database. One of the requirements is to keep versions of different rows. Here a sample of 2 versions of the same object:
ID | UID | Name | Date
--------------------------------------------------------------
1 | 734FD814-024D-4795-AFD0-34FECF89A13A | Alpha | 2013-02-08
2 | 734FD814-024D-4795-AFD0-34FECF89A13A | Bravo | 2013-02-09
In order to have a foreign key with this table as a reference I need to specify a primary key. The two candidates are ID and UID with the first as an auto increment number and the second as a manually generated unique identifier per object.
Limitation :
When selecting ID as primary key:
When a new version of the object is created, all references to an older version became invalid and must be updated
Manually updating all references on each insert is not an option, very heavy
When selecting UID as primary key:
UID is not unique and therefore cannot use it alone, must be associated with an other field and used within a complex primary key
All other fields that would be used may change as well an brake the foreign key references.
Any suggestions about what would be the best approach (lightest as possible) to overcome these limitations?
PS : I am using OrmLite to model the database using POCO objects.
This is a very common scenario in financial applications. An excellent approach is to mark one row as active. For example:
ObjectID, StartDt, EndDt, ...other columns...
where [StartDt, EndDt> marks the time interval where the row was the "actual" row. You can join like:
join YourTable yt
on yt.ObjectId = otherTable.ObjectID
and yt.StartDt is not null
and yt.EndDt is null -- Select active row
The fields that don't change version to version (which could just be the ID) could be put into another table. That's the table to which you link. The version-specific information is then in another table. To help with joining to the latest version info you could have a IsLatest flag that you keep current in that table.
Related
I have a table place2022 which has a very long CHAR column
timestamp | user_id | pixel_color | coordinate
-----------------+------------------------------------------------------------------------------------------+-------------+------------
17:38:20.021+00 | p0sXpmkcmg1KLiCdK5e4xKdudb1f8cjscGs35082sKpGBfQIw92nZ7yGvWbQ/ggB1+kkRBaYu1zy6n16yL/yjA== | #FF4500 | 371,488
17:38:20.024+00 | Ctar52ln5JEpXT+tVVc8BtQwm1tPjRwPZmPvuamzsZDlFDkeo3+ItUW89J1rXDDeho6A4zCob1MKmJrzYAjipg== | #51E9F4 | 457,493
17:38:20.025+00 | rNMF5wpFYT2RAItySLf9IcFZwOhczQhkRhmTD4gv0K78DpieXrVUw8T/MBAZjj2BIS8h5exPISQ4vlyzLzad5w== | #000000 | 65,986
17:38:20.025+00 | u0a7l8hHVvncqYmav27EARAE6ciLtpUTPXMI33lDrUmtj5Ei3ixlfRuG28KUvs7r5LpeiE/iOKPALVjkILhrYg== | #3690EA | 73,961
The user_ids are already hashes, so all I really care about here is having some sort of id column which is 1-1 with the user_id.
I've counted the number of unique user_ids, which is 10381163, which fits into 24 bits. Therefore, I can compress the id field down to a 32-bit integer using the obvious scheme of "Assign 1 to the first new user_id you see, 2 to the second new user_id you see", etc. I don't even care that the user_id's are mapped in the order that they're seen: I just need them to be mapped in an invertible manner to 32-bit ints somehow. I'd also like to persist this mapping somewhere so that, if I want to, I can go backwards.
What would be the best way to achieve this? I imagine that we could create a new table (create table place2022_user_ids as select distinct(user_id) from place2022;?) and then reverse-lookup the user_id column in that table, but I don't know quite how to formulate the queries and also make sure that I'm not doing something ridiculously slow.
I am using postgresql, if it matters.
If you have a recent (>8) version of Postgres you can add an auto increment id column to an existing table.
ALTER TABLE place2022
ADD COLUMN id SERIAL PRIMARY KEY;
NB If the existing column is a PRIMARY KEY you will need to drop it first.
See drop primary key constraint in postgresql by knowing schema and table name only
I am developing an application and in which I have multiple tables,
Table Name : User_master
|Id (PK,AI) | Name | Email | Phone | Address
Table Name: User_auth
|Id (PK,AI) | user_id(FK_userMaster) | UserName | password | Active_Status
Table Name: Userbanking_details
|Id (PK,AI) | user_id(FK_userMaster) | Bank Name | account Name | IFSC
Now, what I want is to save all the updates done in records should not be updated directly instead it should control the version that means I want to track the log of all previous updates user has done.
Which means if user updates the address, then also previous address record history should be stored into the table.
I have tried it by adding fields version_name, version_latest, updated_version_of field and insert new record when update like
|Id (PK,AI) | Name | Email | Phone | Address |version_name |version_latest| updated_version_of
1 | ABC |ABC#gm.com|741852|LA |1 |0 |1
2 | ABC |ABC#gm.com|852741|NY |2 |1 |1
Now the problem comes here is the user table is in FK with other two listed tables so when updating the record their relationship will be lost because of new ID.
I want to preserve the old data shown as old and new updated records will be in effect only with new transactions.
How can I achieve this?
Depending upon your use case you can make a json field in your tables for storing the previous states or a new identical history table for each table.
Dump the entire hash into the history column everytime the user updates anything.
Or insert a new row in the history table for each update in the original.
Storing historical records and current records in the same table, is not a good practice, in a transactional system.
The reasons are:
There will be more I/O due to scanning more number of pages to identify a record
Additional maintenance effort on the table
Transactions getting bigger, longer and cause time out issues
Additional effort of cascading referential integrity changes to child tables
I would suggest to keep historical records in a separate table. You can have OUTPUT clause to have the historical records to be captured and inserted into separate table. In that way, your referential integrity will remain the same. In the historical table, you don't need to have PK defined.
A below sample for using OUTPUT clause with UPDATE. You can read more about OUTPUT clause here
DECLARE #Updated table( [ID] int,
[Name_old] varchar(50),
[Email_old] varchar(50),
[Phone_old] varchar(50),
[Address_old] varchar(50),
[ModifiedDate_old] datetime);
Update User_Master
Set Email= 'NewEmail#Email.com', Name = 'newName', Phone='NewPhone', Address='NewAddress'
ModifiedDate=Getdate()
OUTPUT deleted.Id as Id, deleted.Name as Name_old, deleted.Email as email_old ,
deleted.ModifiedDate as ModifiedDate_old, deleted.Phone as phone_old, deleted.Address AS Address_old, deleted.ModifiedDate as modifiedDate_old
INTO #updated
Where [Id]=1;
INSERT INTO User_Master_History
SELECT * FROM #updated;
When I have faced this situation in the past I have solved it in the following ways:
First Method
Recommended method.
Have a second table which acts as a change history. Because you are not adding rows to the main table your foreign keys maintain integrity.
There are now mechanisms in SQL Server to do this automatically.
SQL Server 2016 Temporal Tables
SQL Server 2017 Change Data Capture
Second Method
I don't recommend this as a good design, but it does work.
Treat one record as the primary record, and this record maintains a foreign key relationship with records in other related tables which are subject to change tracking.
Always update this primary record with any changes thereby maintaining integrity of the foreign keys.
Add a self-referencing key to this table e.g. Parent and a date-of-change column.
Each time the primary record is updated, store the old values into a new record in the same table, and set the Parent value to the id of the primary record. Again the primary record never changes, and therefore your foreign key relationships maintain integrity.
Using the date-of-change column in conjunction with the change history allows you to reconstruct the exact values at any point in time.
So I've tried searching and have yet to find out how to grasp this entirely.
I'm reorganising my database because I was storing user id's as comma separated values in a column withing that row to control permissions. To me, this seems like a better and faster(hardware) way, but I'm moving towards this "proper" way now.
I understand that you need 3 tables. This is what I have.
Table 1. members -> ID | user_name
Table 2. teams -> ID | team_name
Table 3. team_members -> ID | team_fk | member_fk
I understand how to store data in another column and use sql data to display it. What I'm confused about is why I have to link(relation) the columns to the ID's of the other tables. I could get the data without using the relation. I'm confused by what it even does.
Furthermore, I would like to have multiple values that determine permissions for each team. Would I do:
Table 3. team_members -> ID | team_fk | member_fk | leader_fk | captain_fk
^setting 0 or 1(true or false) for the leader and captain.
Or would I create a table(like team_leaders, team_captains) for each permission?
Thanks for the help!
Ryan
It seems that "leader", "captain and "regular member" are roles in your team. So you can create table team_roles, or just assign roles as strings to your relation table, i.e.
team_members -> ID | team_fk | member_fk | role
The key thing about this is to keep your database [normalised]https://en.wikipedia.org/wiki/Database_normalization. It is really easier to work with normalised database in most cases.
What I'm confused about is why I have to link(relation) the columns to the ID's of the other tables. I could get the data without using the relation.
You don't have to declare columns as foreign keys. It's just a good idea. It serves the following purposes:
It tells readers of the schema how the tables are related to each other. If you name the columns well, this is redundant -- team_fk is pretty obviously a reference to the teams table.
It enables automatic integrity checks by the database. If you try to create a team_members row that contains a team_fk or member_fk that isn't in the corresponding table, it will report an error. Note that in MySQL, this checking is only done by the InnoDB engine, not MyISAM.
Indexes are automatically created for the foreign key columns, which helps to optimize queries between the tables.
Table 3. team_members -> ID | team_fk | member_fk | leader_fk | captain_fk
If leader and captain are just true/false values, they aren't foreign keys. A foreign key column contains a reference to a key in another table. So I would call these is_leader and is_captain.
But you should only put these values in the team_members table if a team can have multiple captains and leaders. If there's just one of each, they should be in the teams table:
teams -> ID | team_name | leader_fk | captain_fk
where leader_fk and captain_fk are IDs from the members table. This will ensure that you can't inadvertently assign is_captain = 1 to multiple members from the same team.
Let's say I have this simple table called "characters":
realm_id | character_name | xp
---------|----------------|----------
1 | "mike" | 10
1 | "lara" | 25
2 | "mike" | 40
What I want to do is to have unique names depending on the realm_id. So, for example, while having two "mikes" with different realm_ids is allowed, it's not allowed to have two "mikes" within the same realm_id. Is that possible?
If you're looking to perform a SELECT statement on this data then you'll be looking for something like this (assuming highest XP wins);
SELECT
realm_id
character_name
MAX(XP) AS XP
FROM characters
However, if you want the table to not allow duplicates in the first place then you're best looking at making teh realm_id and character_name a composite primary key. That will stop your duplication happening in the first place although you'll have to consider what happens when somebody tries to insert that duplicate, it'll throw an error.
Create a primary key on the table that consists of realm_id and character_name. The primary key will enforce uniqueness in the table across realm and character. Thus, you could have realm_id=1, character_name='Mike' and realm_id=2, character_name='Mike', but if you tried to insert realm_id=1 and character_name='Mike' again, the insert would fail. Your uniqueness is guaranteed.
I have an access table with an automatic primary key, a date, and other data. The first record starts at 36, due to deleted records. I want to change all the primary keys so they begin at 1 and increment, ordered by the date. Whats the best way to do this?
I want to change the table from this:
| TestID | Date | Data |
| 36 | 12/02/09 | .54 |
| 37 | 12/04/09 | .52 |
To this:
| TestID | Date | Data |
| 1 | 12/02/09 | .54 |
| 2 | 12/04/09 | .52 |
EDIT: Thanks for the input and those who answered. I think some were reading a little too much into my question, which is okay because it still adds to my learning and thinking process. The purpose of my question was two fold: 1) It would simply be nicer for me to have the PK match with the order of my data's dates and 2) to learn if something like this was possible for later use. Such as, if I want to add a new column to the table which numbers the tests, labels the type of test, etc. I am trying to learn a lot at once right now so I get a little confused where to start sometimes. I am building .NET apps and trying to learn SQL and database management and it is sometimes confusing finding the right info with the different RDMS's and ways to interact with them.
Following from MikeW, you can use the following SQL command to copy the data from the old to the new table:
INSERT
TestID, Date, Data
INTO
NewTable
SELECT
TestID, Date, Data
FROM
OldTable;
The new TestID will start from 1 if you use an AutoIncrement field.
I would create a new table, with autoincrement.
Then select all the existing data into it, ordering by date. That will result in the IDs being recreated from "1".
Then you could drop the original table, and rename the new one.
Assuming no foreign keys - if so you'd have to drop and recreate those too.
An Autonumber used as a surrogate primary keys is not data, but metadata used to do nothing but connect records in related tables. If you need to control the values in that field, then it's data, and you can't use an Autonumber, but have to roll your own autoincrement routine. You might want to look at this thread for a starting point, but code for this for use in Access is available everywhere Access programmers congregate on the Net.
I agree that the value of the auto-generated IDENTITY values should have no meaning, even for the coder, but for education purposes, here's how to reseed the IDENTITY using ADO:
ACC2000: Cannot Change Default Seed and Increment Value in UI
Note the article as out of date because it says, "there are no options available in the user interface (UI) for you to make this change." In later version the Access, the SQL DLL could be executed when in ANSI-92 Query Mode e.g. something like this:
ALTER TABLE MyTable ALTER TestID INTEGER IDENTITY (1, 1) NOT NULL;