What are the best practices regarding storing metadata about a row with a row?
Take the example of a inter-bank financial transfer. The Transfer might look like:
CREATE TABLE Transfers (
TransferID int,
FromTransit varchar(10),
FromBranch varchar(10),
FromAccount varchar(50),
ToTransit varchar(10),
ToBranch varchar(10),
ToAccount varchar(50),
Amount money,
Status varchar(50));
But now, of course, people will want to see meta-data:
ALTER TABLE Transfers
ADD
CreatedDate datetime,
LastModifiedDate datetime,
CreatedByUsername varchar(50),
CreatedByFullname varchar(200),
CreatedByWorkstation varchar(50),
VoidedDate datetime NULL,
VoidedByUsername datetime NULL,
VoidedByFullname datetime NULL,
VoidApprovedBySupervisorUsername varchar(50) NULL,
VoidApprovedBySupervisorFullname varchar(200) NULL,
VoidApprovedBySupervisorWorkstation varchar(50) NULL,
SentDate datetime NULL,
SentByUsername varchar(50) NULL,
SentByFullname varchar(50) NULL,
SentByWorkstation varchar(50) NULL,
SendApprovedBySupervisorUsername varchar(50) NULL,
SendApprovedBySupervisorFullname varchar(50) NULL,
SendApprovedBySupervisorWorkstation varchar(50) NULL,
SendConfirmationNumber varchar(50) NULL,
SentToRemoteMachineName varchar(50) NULL,
ReceivedDate datetime NULL,
ReceivedConfirmationNumber varchar(50) NULL,
ReceivedToRemoteMachineName varchar(50) NULL,
ReceivedByUsername varchar(50) NULL,
ReceivedByFullname varchar(50) NULL,
ReceivedByWorkstation varchar(50) NULL,
ReceiveApprovedBySupervisorUsername varchar(50) NULL,
ReceiveApprovedBySupervisorFullname varchar(50) NULL,
ReceivedApprovedBySupervisorWorkstation varchar(50) NULL,
ReceivedCheckedBySupervisorUsername varchar(50) NULL,
ReceivedCheckedBySupervisorFullname varchar(50) NULL,
ReceivedCheckedBySupervisorWorkstation varchar(50) NULL
)
These are all well-defined values, that will all appear on the hard-copy related to a transfer.
We already have audit logging of changes in tables, but that wouldn't catch something like:
UPDATE Transfers SET Status = 'TransferStatus_Received'
WHERE TransferID = 6744891
It would catch the username, fullname, and machine name of the person who made the change; but it can't know the name of the supervisor who was over the person's shoulder to enter their credentials to "authorize" the transfer to be received.
My aggravation comes when they ask for another piece of information to be tracked, and I have to add more metadata columns to my data table.
Is this best practice?
This is not good practice for financial databases because you allow updates. If you allow updates it does not matter what logging, auditing, crypto keys or whatever you add since a hostile party could just update them.
Instead you must forbid updates; all changes must be inserts. All tables should have an indexed sequential FK column and all joins are on Max(seq). This means you perform all transactions on the latest data but have a permanent record of every transaction on these tables.
Edit: If what you're asking is whether you should add the audit columns to the original table, that depends if the audit columns are sparse or nullable. From your comments, it seems they are.
In that case, you should create separate tables for each nullable group of audit attributes and perform an outer join on those tables, joining with the sequential column of the original database. This means you can add or drop audit tables at will without affecting your data table. Something like:
SELECT t.transferID, t.money, u.Date, u.workstation, s.name, ...
FROM Transfers t
LEFT OUTER JOIN Users u ON u.seq = t.seq
LEFT OUTER JOIN Supervisors s ON s.seq = t.seq
WHERE t.seq = (SELECT Max(seq) FROM Transfers WHERE whatever)
You can create a view or stored procedure that saves Max(seq) if you need to reuse it in a transaction.
I don't know much about SQL Server but when confronted with such in an Oracle scenario I tend to employ triggers (insert/update/delete) which take the complete row (before and after) into an "archive/audit" table and add whatever "metadata" they want logged along with it... this way my app-centric data model won't get poluted regarding applications/SPs etc. and no app/user has access to that sensitive logging/auditing information...
Related
I am trying to run the below script but I keep getting the error messages
"Subqueries are not allowed in this context. Only scalar expressions are allowed." on lines 16 and 34.
I know where it's failing - it is failing on the AS clauses, but I don't know how to correct it with different code to stop the errors from appearing.
I have tried looking around at other existing questions but none helped me that I can find. As the issue I have here is using data from columns in different tables along with columns in the current table.
Could I get some help with getting this working and advise what code will be better please?
Thanks for your help in advance!
Dan
This is the code for my database::
CREATE DATABASE [LEARNING]
GO
CREATE TABLE Trainees
(
Trainee_ID int IDENTITY(1,1) PRIMARY KEY NOT NULL,
Name varchar(50) NOT NULL,
[Assigned Tutor_ID] int NOT NULL,
)
GO
CREATE TABLE Tutors
(
Tutor_ID int IDENTITY(1,1) PRIMARY KEY NOT NULL,
Name varchar(50) NOT NULL,
[Assigned Trainee_ID] AS (Select Trainee_ID from Trainees where Tutors.[Assigned Trainee_ID] = Trainees.Trainee_ID) NOT NULL
)
GO
CREATE TABLE [Rooms]
(
Room_ID int IDENTITY(1,1) PRIMARY KEY NOT NULL,
[Room Name] varchar(50) NOT NULL,
[Cost per hour] money NOT NULL
)
GO
CREATE TABLE [Rooms Rented]
(
Rented_ID int IDENTITY(1,1) PRIMARY KEY NOT NULL,
Room_ID int NOT NULL,
Tutor_ID int NOT NULL,
[Length of time in hours] int NOT NULL,
[Total Cost] AS (select ([Rooms Rented].[Length of time in hours])*([Rooms].[Cost per hour]) from [Rooms]) NOT NULL
)
GO
INSERT INTO Tutors values ('Nikki Smith',1)
GO
INSERT INTO Trainees Values ('Tyler Hatherall')
GO
INSERT INTO Rooms values ('Training Room 1',6.50)
GO
INSERT INTO [Rooms Rented] values (1,1,2)
GO
Computed columns are used to ensure columns persisted property within the table itself.
In your case, you need to have another update after you created table, populate the column by the query similar to below query, also, you need to create Foreign Key in the Total Cost column based on what you try to achieve.
UPDATE A
SET A.[Total Cost] = A.[Length of time in hours] * B.[Cost per hour] --add ISNULL to treat NULL if needed
FROM [Rooms Rented] as A
INNER JOIN [Rooms] as B
ON B.Room_ID = A.Room_ID
Your AS statements are computed columns. When your computed columns refer to other tables, you cannot implement this directly. You will have to create scalar functions first.
For example, after creating Rooms, create this function that takes a room id and returns cost per hour:
create function f_get_Rooms_CostPerHour (#Room_ID int)
returns money
as
return (select [Cost per hour] from [Rooms] where [Rooms].Room_ID=#Room_ID)
Now you can use this in your computed column formula. Note that a computed column formula never has a SELECT in it. It also does not have a null/not null specification.
CREATE TABLE [Rooms Rented]
(
Rented_ID int IDENTITY(1,1) PRIMARY KEY NOT NULL,
Room_ID int NOT NULL,
Tutor_ID int NOT NULL,
[Length of time in hours] int NOT NULL,
[Total Cost] AS ([Length of time in hours]*f_get_Rooms_CostPerHour([Room_ID]))
)
First I've created a table with information on stores and transactions with the following query:
CREATE TABLE main.store_transactions
(
store_id varchar(100) NOT NULL,
store_name varchar(100),
store_transaction_id varchar(100),
transaction_name varchar(100),
transaction_date timestamp,
transaction_info varchar(200),
primary_key(store_id)
)
But then I realized that the same store may have various transactions related to it, not just one. How should I implement table creation in this case?
One thing that comes to mind is to create a separate table with transactions, each transaction having store_id as a foreign key. And then just join when needed.
How is it possible to implement it in a single table?
Well, the most elegant way would be indeed to create a satelite table for your stores and reference it to the store_transactions table, e.g:
CREATE TABLE stores
(
store_id varchar(100) NOT NULL PRIMARY KEY,
store_name varchar(100)
);
CREATE TABLE store_transactions
(
store_id varchar(100) NOT NULL REFERENCES stores(store_id),
store_transaction_id varchar(100),
transaction_name varchar(100),
transaction_date timestamp,
transaction_info varchar(200)
);
With this structure you will have many transactions to a single store.
There are other less appealing options, such as customizing a data type for stores and creating an array of it in the table store_transactions. But regarding the costly maintainability of such approach, I would definitely discourage it.
I would like to create a database with a couple tables in it. I am using SSMS and it is easy enough to accomplish said task by right-click creating, but I would like to use query/command line (not sure how to phrase that).
Create a database and table
CREATE DATABASE test_employees;
CREATE TABLE dbo.EmployeesBasicInfo
(
MyKeyField VarChar(8) PRIMARY KEY,
FirstName VarChar(30) NOT NULL,
LastName VarChar(50) NOT NULL,
DateStarted DateTime NOT NULL,
Age Int NOT NULL,
ModifiedDate DateTime NULL
);
But I have no idea where the table goes or how to move/link it to database test_employees.
Also, if you feel ambition in answering my question then the next step is to auto-generate data for all fields. Any links you could provide would be helpful. And, if anything I'm doing is not best-practice please let me know - I'm just getting into SQL.
After you've created the database you need to
Use test_employees
Go
This sets your current working database for subsequent statements.
Alternatively you can qualify your create table with the database name
Create Table test_employees.dbo.EmployeesBasicInfo (
MyKeyField varchar(8) primary key,
FirstName varchar(30) not null,
LastName varchar(50) not null,
DateStarted DateTime not null,
Age int not null,
ModifiedDate datetime null
);
You've probably created this table in your default database, which may well be master.
Also for any recent version of SQL Server, consider using datetime2 instead of datetime (or date if you don't need the time component). It's better in all respects.
Here is some code to create a table 1st and then you can add some test data into it to practice with .......
TO CREATE A TABLE :
Create Table dbo.BasicInfo (
MyKeyField INT primary key IDENTITY(1,1),
FirstName varchar(30) not null,
LastName varchar(50) not null,
DateStarted DateTime not null,
Age int not null,
ModifiedDate datetime null
)
GO
TO ADD PRACTICE DATA:
DECLARE #Value INT = 1
WHILE #Value <= 100
BEGIN
INSERT INTO dbo.BasicInfo (FirstName, LastName, DateStarted, Age, ModifiedDate)
VALUES ('First_Name_' + CAST(#Value AS VARCHAR), 'Last_Name_'+ CAST(#Value AS VARCHAR), DATEADD(DAY, -CONVERT(INT, (5000+1)*RAND()), GETDATE()),
18 + CONVERT(INT, (30-10+1)*RAND()), DATEADD(DAY, 10 + (30-10)*RAND() ,DATEADD(DAY, -CONVERT(INT, (5000+1)*RAND()), GETDATE())))
SET #Value = #Value + 1
END
if you want to add more then 100 rows in your table just replace the number or rows you wish to add to your table with 100 in this code
When creating tables, I have generally created them with a couple extra columns that track change times and the corresponding user:
CREATE TABLE dbo.Object
(
ObjectId int NOT NULL IDENTITY (1, 1),
ObjectName varchar(50) NULL ,
CreateTime datetime NOT NULL,
CreateUserId int NOT NULL,
ModifyTime datetime NULL ,
ModifyUserId int NULL
) ON [PRIMARY]
GO
I have a new project now where if I continued with this structure I would have 6 additional columns on each table with this type of change tracking. A time column, user id column and a geography column. I'm now thinking that adding 6 columns to every table I want to do this on doesn't make sense. What I'm wondering is if the following structure would make more sense:
CREATE TABLE dbo.Object
(
ObjectId int NOT NULL IDENTITY (1, 1),
ObjectName varchar(50) NULL ,
CreateChangeId int NOT NULL,
ModifyChangeId int NULL
) ON [PRIMARY]
GO
-- foreign key relationships on CreateChangeId & ModifyChangeId
CREATE TABLE dbo.Change
(
ChangeId int NOT NULL IDENTITY (1, 1),
ChangeTime datetime NOT NULL,
ChangeUserId int NOT NULL,
ChangeCoordinates geography NULL
) ON [PRIMARY]
GO
Can anyone offer some insight into this minor database design problem, such as common practices and functional designs?
Where i work, we use the same construct as yours - every table has the following fields:
CreatedBy (int, not null, FK users table - user id)
CreationDate (datetime, not null)
ChangedBy (int, null, FK users table - user id)
ChangeDate (datetime, null)
Pro: easy to track and maintain; only one I/O operation (i'll come to that later)
Con: i can't think of any at the moment (well ok, sometimes we don't use the change fields ;-)
IMO the approach with the extra table has the problem, that you will have to reference somehow also the belonging table for every record (unless you only need the one direction Object to Tracking table). The approach also leads to more I/O database operations - for every insert or modify you will need to:
add entry to Table Object
add entry to Tracking Table and get the new Id
update Object Table entry with the Tracking Table Id
It would certainly make the application code that communicates with the DB a bit more complicated and error-prone.
I have some txt files that contain tables with a mix of different records on them which have diferent types of values and definitons for columns. I was thinking of importing it into a table and running a query to separate the different record types since a identifier to this is listed in the first column. Is there a way to change the value type of a column in a query? since it will be a pain to treat all of them as text. If you have any other suggestions on how to solve this please let me know as well.
Here is an example of tables for 2 record types provided by the website where I got the data from
create table dbo.PUBACC_A2
(
Record_Type char(2) null,
unique_system_identifier numeric(9,0) not null,
ULS_File_Number char(14) null,
EBF_Number varchar(30) null,
spectrum_manager_leasing char(1) null,
defacto_transfer_leasing char(1) null,
new_spectrum_leasing char(1) null,
spectrum_subleasing char(1) null,
xfer_control_lessee char(1) null,
revision_spectrum_lease char(1) null,
assignment_spectrum_lease char(1) null,
pfr_status char(1) null
)
go
create table dbo.PUBACC_AC
(
record_type char(2) null,
unique_system_identifier numeric(9,0) not null,
uls_file_number char(14) null,
ebf_number varchar(30) null,
call_sign char(10) null,
aircraft_count int null,
type_of_carrier char(1) null,
portable_indicator char(1) null,
fleet_indicator char(1) null,
n_number char(10) null
)
Yes, you can do what you want. In ms access you can use any VBA functions and with some
IIF(FirstColumn="value1", CDate(SecondColumn), NULL) as DateValue,
IIF(FirstColumn="value2", CDec(SecondColumn), NULL) as DecimalValue,
IIF(FirstColumn="value3", CStr(SecondColumn), NULL) as StringValue
You can use all/any of the above in your SELECT.
EDIT:
From your comments it seems that you want to split them into different tables - importing as text should not be a problem in that case.
a)
After you import and get it in the initial table, create the proper table manually setting you can INSERT into the proper table.
b)
You could even do a make table query, but it might be faster to create it manually. If you do a make table query you have to be sure that you have casted the data into proper type in your select.
EDIT2:
As you updated the question showing the structure it becomes obvious that my suggestion above will not help directly.
If this is one time process you can follow HLGEM's solution. Here are some more details.
1) Import into a table with two columns - RecordType char(2), Rest memo
2) Now you can split the data (make two queries that select based on RecordType) and re-export the data (to be able to use access' import wizard)
3) Now you have two text files with proper structure which can be easily imported
I did this in my last job. You start with a staging table that has one column or two coulmns if your identifier is always the same length.
Then using the record identifier, you move the data to another set of staging tables, one for each type of record you have. This will be in columns for the data and can have the correct data types. Then you do any data cleaning you need to do. Then you insert into the real production table.
If you have a column defined as text, because it has both alphas and numbers, you'll only be able to query it as if it were text. Once you've separated out the different "types" of data into their own tables, you should be able to change the schema definition. Please comment here if I'm misunderstanding what you're trying to do.