How to make bulk insert work with multiple tables

How to make bulk insert work with multiple tables - sql

How with SQL server bulk insert can I insert into multiple tables when there is a foreign key relationship?
What I mean is that the tables are this,
CREATE TABLE [dbo].[UndergroundFacilityShape]
([FacilityID] [int] IDENTITY(1,1) NOT NULL,
[FacilityTypeID] [int] NOT NULL,
[FacilitySpatialData] [geometry] NOT NULL)
CREATE TABLE [dbo].[UndergroundFacilityDetail]
([FacilityDetailID] [int] IDENTITY(1,1) NOT NULL,
[FacilityID] [int] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Value] [nvarchar](255) NOT NULL)
So each UndergroundFacilityShape can have multiple UndergroundFacilityDetail. The problem is that the FacilityID is not defined until the insert is done because it is an identity column. If I bulk insert the data into the Shape table then I cannot match it back up the the Detail data I have in my C# application.
I am guessing the solution is to run a SQL statement to find out what the next identity value is and popuplate the values myself and turn off the identity column for the bulk insert? Bear in mind that only one person is going to be running this application to insert data, and it will be done infrequently so we don't have to worry about identity values clashing or anything like that.
I am trying to import thousands of records, which takes about 3 minutes using standard inserts, but bulk insert will take a matter of seconds.
In the future I am expecting to import data that is much bigger than 'thousands' of records.

Turns out that this is quite simple. Get the current identity values on each of the tables, and populate them into the DataTable myself incrementing them as I use them. I also have to make sure that the correct values are used to maintain the relationship. That's it. It doesn't seem to matter whether I turn off the identity columns or not.
I have been using this tool on live data for a while now and it works fine.
It was well worth it though the import takes no longer than 3 seconds (rather than 3 minutes), and I expect to receive larger datasets at some point.
So what about if more than one person uses the tool at one time? Well yes I expect issues, but this is never going to be the case for us.

Peter, you mentioned that you already have a solution with straight INSERTs.
If the destination table does not have a clustered index (or has a clustered index and is empty), just using the TABLOCK query hint will make it a minimally-logged transaction, resulting on a considerable speed up.
If the destination table has a clustered index and is not empty, you can also enable trace flag 610 in addition to the TABLOCK query hint to make it a minimally-logged transaction.
Check the "Using INSERT INTO…SELECT to Bulk Import Data with Minimal Logging" section on the INSERT MSDN page.

Related

Is there a "standard" Primary Key pool table implementation method?

Back in the "good o' days" I used Sybase's SQL Anywhere database. It had a feature to avoid collisions when multiple users created new records: A separate table of key values existed that would be used to dole out blocks of unique keys to client applications to be used in subsequent inserts into other tables. When the client's pool of keys gets low, the client requests another block of keys from the server. The keys could be specific to a single table (that is each table has it's own key pool), or the keys could be "shared" among tables such that an INSERT INTO Table1 might use Key=100, and a following INSERT INTO Table2 would then use Key=101.
This key pool had the benefit that the primary key assigned at the client side could also be used as a foreign key in creating inserts into other tables - all on the client side without first committing the transaction if the user ultimately abandons the new data.
I've searched for similar functionality, but I only seem to find database replication and mirroring, not anything about a table of keys.
We are using a shared database and multiple clients running a VB.NET application for data access and creation.
The basic table I had in mind looks something like:
CREATE TABLE [KeyPool] (
[KeyNo] [int] IDENTITY PRIMARY KEY NOT NULL,
[AssignedTo] [varchar](50) NULL,
[Status] [nchar](10) NULL,
[LastTouched] [datetime2] NULL,
)
The Status and LastTouched columns would allow for recovery of "lost keys" if garbage collection was desired, but these are not really necessary. In fact, simply having a single row that stores the last key value given to a client would be the minimum requirement: Just hand out keys in blocks of 1000 upon request and increment the counter to know what block to hand out next. However, without the table that is tracking who has what keys, there would be lots of "wasted" key values (which may or may not be an issue depending on the potential number of records expected in the database).
I'm looking for any "standard" methods in SQL Server before I go out and duplicate the effort of creating my own solution.

Use a Sequence object. You can use ‘NEXT VALUE FOR’ in a default or query, or request blocks of keys from the client using sp_sequence_get_range.

How to lock a specific row that doesn't exist yet in SQL Server

I have an API rate limit table that I'm managing for one of our applications. Here's the definition of it.
CREATE TABLE [dbo].[RateLimit]
(
[UserId] [int] NOT NULL,
[EndPointId] [smallint] NOT NULL,
[AllowedRequests] [smallint] NOT NULL,
[ResetDateUtc] [datetime2](0) NOT NULL,
CONSTRAINT [PK_RateLimit]
PRIMARY KEY CLUSTERED ([UserId] ASC, [EndPointId] ASC)
) ON [PRIMARY]
The process that performs CRUD operations on this table is multi-threaded, and therefore careful consideration needs to be placed on this table, which acts as the goto for rate limit checks (i.e. have we surpassed our rate limit, can we make another request, etc.)
I'm trying to introduce SQL locks to enable the application to reliably INSERT, UPDATE, and SELECT values without having the value changed from under it. Besides the normal complexity of this, the big pain point is that the RateLimit record for the UserId+EndPointId may not exist - and would need to be created.
I've been investigating SQL locks, but the thing is that there might be no row to lock if the rate limit record doesn't exist yet (i.e first run).
I've thought about creating a temp table used specifically for controlling the lock flow - but I'm unsure how this would work.
At the farthest extreme, I could wrap the SQL statement in a SERIALIZABLE transaction (or something to that degree), but locking the entire table would have drastic performance impacts - I only care about the userid+endpointid primary key, and making sure that the specific row is read + updated/inserted by one process at a time.
How can I handle this situation?
Version: SQL Server 2016
Notes: READ_COMMITTED_SNAPSHOT is enabled

SQL Indexing Strategy on Link Tables

I often find myself creating 'link tables'. For example, the following table maps a user record to an event record.
CREATE TABLE [dbo].[EventLog](
[EventId] [int] NOT NULL,
[UserId] [int] NOT NULL,
[Time] [datetime] NOT NULL,
[Timestamp] [timestamp] NOT NULL
)
For the purposes of this question, please assume the combination of EventId plus UserId is unique and that the database in question is a MS SQL Server 2008 installation.
The problem I have is that I am never sure as to how these tables should be indexed. For example, I might want to list all users for a particular event, or I might want to list all events for a particular user or, perhaps, retrieve a particular EventId/UserId record. Indexing options I have considered include:
Creating a compound primary key on EventId and UserId (but I
understand the index won't be useful when accessing by UserId on its
own).
Creating a compound primary key on EventId and UserId and a adding a
supplemental index on UserId.
Creating a primary key on EventId and a supplemental index on
UserId.
Any advice would be appreciated.

The indices are designed to solve performance problems. If you don't yet have such problem and cannot exactly know where you'll face troubles then you shouldn't create indexes. The indices are quite expensive. Because it not only takes up disk space but also causes the overhead of writing or modifying data. So you have to be clear understand what the specific performance problem you decide by creating an index. So you can appreciate the need to create it.

The answer to your question depends on several aspects.
It depends on the DBMS you are going to use. Some prefer single-column indexes (like Postgresql), some can take more advantage of multi-column indexes (like Oracle). Some can answer a query completely from a covering index (like sqlite), others cannot and eventually have to read the pages of the actual table (again, like postgres).
It depends on the queries you want to answer. For example, do you navigate in both directions, i.e., do you join on both of your Id columns?
It depends on your space and processing time requirements for data modification, too. Keep in mind that indexes are often bigger than the actual table that they index, and that updating indexes is often more expensive that just updating the underlying table.
EDIT:
When your conceptual model has a many-to-many relationship R between two entities E1 and E2, i.e., the logical semantics of R is either "related" or "not-related", than I would always declare that combined primary key for R. That will create a unique index. The primary motivation is, however, data consistency, not query optimization, i.e.:
CREATE TABLE [dbo].[EventLog](
[EventId] [int] NOT NULL,
[UserId] [int] NOT NULL,
[Time] [datetime] NOT NULL,
[Timestamp] [timestamp] NOT NULL,
PRIMARY KEY([EventId],[UserId])
)

SQL Database Deleted Flag and DeletedBy DeletedOn

I've added a history table to my database. Originally I added a Bit called Deleted and intended to update it to 1 if that row was ever deleted, otherwise each row is an update.
Then I was informed we need to log who deleted what when. So I added Nullable [DeletedBy] [DeletedOn] fields.
At this point I was wondering if this made my Deleted Bit redundant. You could simply query the table, checking where DeletedBy is not Null if you want to see which row is Deleted.
I intended to ask in this question which is better practice:
Having the extra Bit Column
Using the nullable columns that are already there, to Identify Deleted Rows
But I'm starting to think this is a preference thing. So instead my question is, which is more efficient? If this table gets massive, is there a performance gain to running:
Select * from MyTable where [Deleted] = 1
over
Select * from MyTable where [DeletedBy] is not null

This is more a preference. Technically the datetime field is larger than a bit field, but since you are required to store it anyway it does not really matter. However performance wise you can index either and get the same results. I personally think the bit field is redundant and use the nullable datetime.

If you added the 'DeletedBy' bit a while ago, and there already records in your live database that are 'deleted', then you need to keep the bit field, as you don't have the information to enter in the 'deleted by' at this stage (I imagine).

well you do need to know who deleted, so the DeletedBy column MUST stay there. Which makes the main question: should you keep the bit column or not?
The answer is simple: no :)
I know it is just a bit columns and it doesn't occupy much but a bit multiply by a lot of rows are a lot of bits. It probably wont impact your storage of course, but there is no reason to keep redundant data in this case.
regarding the deleted = 1 fields you may have, just update the DeletedBy to something like 'system' or something that tells you that the record was deleted before the implementation of the new field

You are basically creating an audit trail, and it's simple to do. First, create all of your audit tables with some standard fields for audit information. For example:
[audit_id] [int] IDENTITY(1,1) NOT NULL,
[audit_action] [varchar](16) NOT NULL,
[audit_date] [datetime] NOT NULL,
[audit_user_name] [varchar](128) NOT NULL,
--<your fields from the table being audited>
Default the audit_date to a value of getdate(). If you are using Active Directory security, default audit_user_name to a value of suser_sname(), otherwise you'll have to provide this in your query.
Now, create a trigger for INSERT, UPDATE, and DELETE for the table to be audited. You'll write the values into your audit table. Here is an example for DELETE:
CREATE TRIGGER [dbo].[tr_my_table_being_audited_delete]
ON [dbo].[my_table_being_audited]
AFTER DELETE
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO dbo.my_audit_trail_table (audit_action, --<your fields from the table being audited>)
SELECT 'DELETE', --<your fields from the table being audited>
FROM deleted
END

For massive tables I really don't like using soft deletes, I prefer archiving but I understand all projects are different.
I would probably just keep the 'DeletedBy' Flag on the primary table since its a little less overhead and create a DeletionLog table with 'DeletedBy' and 'Timestamp' for auditing
This would be especially beneficial in a high read environment.

Create DB2 History Table Trigger

I want to create a history table to track field changes across a number of tables in DB2.
I know history is usually done with copying an entire table's structure and giving it a suffixed name (e.g. user --> user_history). Then you can use a pretty simple trigger to copy the old record into the history table on an UPDATE.
However, for my application this would use too much space. It doesn't seem like a good idea (to me at least) to copy an entire record to another table every time a field changes. So I thought I could have a generic 'history' table which would track individual field changes:
CREATE TABLE history
(
history_id LONG GENERATED ALWAYS AS IDENTITY,
record_id INTEGER NOT NULL,
table_name VARCHAR(32) NOT NULL,
field_name VARCHAR(64) NOT NULL,
field_value VARCHAR(1024),
change_time TIMESTAMP,
PRIMARY KEY (history_id)
);
OK, so every table that I want to track has a single, auto-generated id field as the primary key, which would be put into the 'record_id' field. And the maximum VARCHAR size in the tables is 1024. Obviously if a non-VARCHAR field changes, it would have to be converted into a VARCHAR before inserting the record into the history table.
Now, this could be a completely retarded way to do things (hey, let me know why if it is), but I think it it's a good way of tracking changes that need to be pulled up rarely and need to be stored for a significant amount of time.
Anyway, I need help with writing the trigger to add records to the history table on an update. Let's for example take a hypothetical user table:
CREATE TABLE user
(
user_id INTEGER GENERATED ALWAYS AS IDENTITY,
username VARCHAR(32) NOT NULL,
first_name VARCHAR(64) NOT NULL,
last_name VARCHAR(64) NOT NULL,
email_address VARCHAR(256) NOT NULL
PRIMARY KEY(user_id)
);
So, can anyone help me with a trigger on an update of the user table to insert the changes into the history table? My guess is that some procedural SQL will need to be used to loop through the fields in the old record, compare them with the fields in the new record and if they don't match, then add a new entry into the history table.
It'd be preferable to use the same trigger action SQL for every table, regardless of its fields, if it's possible.
Thanks!

I don't think this is a good idea, as you generate even more overhead per value with a big table where more than one value changes. But that depends on your application.
Furthermore you should consider the practical value of such a history table. You have to get a lot of rows together to even get a glimpse of context to the value changed and it requeries you to code another application that does just this complex history logic for an enduser. And for an DB-admin it would be cumbersome to restore values out of the history.
it may sound a bit harsh, but that is not the intend. An experienced programmer in our shop had a simmilar idea through table journaling. He got it up and running, but it ate diskspace like there's no tomorrow.
Just think about what your history table should really accomplish.

Have you considered doing this as a two step process? Implement a simple trigger that records the original and changed version of the entire row. Then write a separate program that runs once a day to extract the changed fields as you describe above.
This makes the trigger simpler, safer, faster and you have more choices for how to implement the post processing step.

We do something similar on our SQL Server database, but the audit tables are for each indvidual table audited (one central table would be huge as our database is many many gigabytes in size)
One thing you need to do is make sure you also record who made the change. You should also record the old and new value together (makes it easier to put data back if you need to) and the change type (insert, update, delete). You don't mention recording deletes from the table, but we find those the some of the things we most frequently use the table for.
We use dynamic SQl to generate the code to create the audit tables (by using the table that stores the system information) and all audit tables have the exact same structure (makes is easier to get data back out).
When you create the code to store the data in your history table, create the code as well to restore the data if need be. This will save tons of time down the road when something needs to be restored and you are under pressure from senior management to get it done now.
Now I don't know if you were planning to be able to restore data from your history table, but once you have once, I can guarantee that management will want it used that way.

CREATE TABLE HIST.TB_HISTORY (
HIST_ID BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 0, INCREMENT BY 1, NO CACHE) NOT NULL,
HIST_COLUMNNAME VARCHAR(128) NOT NULL,
HIST_OLDVALUE VARCHAR(255),
HIST_NEWVALUE VARCHAR(255),
HIST_CHANGEDDATE TIMESTAMP NOT NULL
PRIMARY KEY(HIST_SAFTYNO)
)
GO
CREATE TRIGGER COMMON.TG_BANKCODE AFTER
UPDATE OF FRD_BANKCODE ON COMMON.TB_MAINTENANCE
REFERENCING OLD AS oldcol NEW AS newcol FOR EACH ROW MODE DB2SQL
WHEN(COALESCE(newcol.FRD_BANKCODE,'#null#') <> COALESCE(oldcol.FRD_BANKCODE,'#null#'))
BEGIN ATOMIC
CALL FB_CHECKING.SP_FRAUDHISTORY_ON_DATACHANGED(
newcol.FRD_FRAUDID,
'FRD_BANKCODE',
oldcol.FRD_BANKCODE,
newcol.FRD_BANKCODE,
newcol.FRD_UPDATEDBY
);--
INSERT INTO FB_CHECKING.TB_FRAUDMAINHISTORY(
HIST_COLUMNNAME,
HIST_OLDVALUE,
HIST_NEWVALUE,
HIST_CHANGEDDATE

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to make bulk insert work with multiple tables - sql

Related

Is there a "standard" Primary Key pool table implementation method?

How to lock a specific row that doesn't exist yet in SQL Server

SQL Indexing Strategy on Link Tables

SQL Database Deleted Flag and DeletedBy DeletedOn

Create DB2 History Table Trigger

Categories

Resources