How to add a row and timestamp one SQL Server table based on a change in a single column of another SQL Server table - sql

[UPDATE: 2/20/19]
I figured out a pretty trivial solution to solve this problem.
CREATE TRIGGER TriggerClaims_History on Claims
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON
INSERT INTO Claims_History
SELECT name, status, claim_date
FROM Claims
EXCEPT SELECT name, status, claim_date FROM Claims_History
END
GO
I am standing up a SQL Server database for a project I am working on. Important info: I have 3 tables - enrollment, cancel, and claims. There are files located on a server that populate these tables every day. These files are NOT deltas (i.e. each new file placed on server every day contains data from all previous files) and because of this, I am able to simply drop all tables, create tables, and then populate tables from files each day. My question is regarding my claims table - since tables will be dropped and created each night, I need a way to keep track of all the different status changes.
I'm struggling to figure out the best way to go about this.
I was thinking of creating a claims_history table that is NOT dropped each night. Essentially I'd want my claims_history table to be populated each time an initial new record is added to the claims table. Then I'd want to scan the claims table and add a row to the claims_history table if and only if there was a change in the status column (i.e. claims.status != claims_history.status).
Day 1:
select * from claims
id | name | status
1 | jane doe | received
select * from claims_history
id | name | status | timestamp
1 | jane doe | received | datetime
Day 2:
select * from claims
id | name | status
1 | jane doe | processed
select * from claims_history
id | name | status | timestamp
1 | jane doe | received | datetime
1 | jane doe | processed | datetime
Is there a SQL script that can do this? I'd also like to automatically have the timestamp field populate in claims_history table each time a new row is added (status change). I know I could write a python script to handle something like this, but i'd like to keep it in SQL if at all possible. Thank you.

Acording to your questions you need to create a trigger after update of the column claims.status and it very simple to do that use this link to know and see how to do a simple trigger click here create asimple sql server trigger
then as if there is many problem to manipulate dateTime in a query a would suggest you to use UNIX time instead of using datetime you can use Long or bigInt UNix time store the date as a number to know the currente time simple use the query SELECT UNIX_TIMESTAMP()

A very common approach is to use a staging table and a production (or final) table. All your ETLs will truncate and load the staging table (volatile) and then you execute an Stored Procedure that adds only the new records to your final table. This requires that all the data you handle this way have some form of key that identifies unequivocally a row.
What happens if your files suddenly change format or are badly formatted? You will drop your table and won't be able to load it back until you fix your ETL. This approach will save you from that, since the process will fail while loading the staging table and won't impact the final table. You can also keep deleted records for historic reasons instead of having them deleted.
I prefer to separate the staging tables into their proper schema, for example:
CREATE SCHEMA Staging
GO
CREATE TABLE Staging.Claims (
ID INT,
Name VARCHAR(100),
Status VARCHAR(100))
Now you do all your loads from your files into these staging tables, truncating them first:
TRUNCATE TABLE Staging.Claims
BULK INSERT Staging.Claims
FROM '\\SomeFile.csv'
WITH
--...
Once this table is loaded you execute a specific SP that adds your delta between the staging content and your final table. You can add whichever logic you want here, like doing only inserts for new records, or inserting already existing values that were updated on another table. For example:
CREATE TABLE dbo.Claims (
ClaimAutoID INT IDENTITY PRIMARY KEY,
ClaimID INT,
Name VARCHAR(100),
Status VARCHAR(100),
WasDeleted BIT DEFAULT 0,
ModifiedDate DATETIME,
CreatedDate DATETIME DEFAULT GETDATE())
GO
CREATE PROCEDURE Staging.UpdateClaims
AS
BEGIN
BEGIN TRY
BEGIN TRANSACTION
-- Update changed values
UPDATE C SET
Name = S.Name,
Status = S.Status,
ModifiedDate = GETDATE()
FROM
Staging.Claims AS S
INNER JOIN dbo.Claims AS C ON S.ID = C.ClaimID -- This has to be by the key columns
WHERE
ISNULL(C.Name, '') <> ISNULL(S.Name, '') AND
ISNULL(C.Status, '') <> ISNULL(S.Status, '')
-- Insert new records
INSERT INTO dbo.Claims (
ClaimID,
Name,
Status)
SELECT
ClaimID = S.ID,
Name = S.Name,
Status = S.Status
FROM
Staging.Claims AS S
WHERE
NOT EXISTS (SELECT 'not yet loaded' FROM dbo.Claims AS C WHERE S.ID = C.ClaimID) -- This has to be by the key columns
-- Mark deleted records as deleted
UPDATE C SET
WasDeleted = 1,
ModifiedDate = GETDATE()
FROM
dbo.Claims AS C
WHERE
NOT EXISTS (SELECT 'not anymore on files' FROM Staging.Claims AS S WHERE S.ClaimID = C.ClaimID) -- This has to be by the key columns
COMMIT
END TRY
BEGIN CATCH
DECLARE #v_ErrorMessage VARCHAR(MAX) = ERROR_MESSAGE()
IF ##TRANCOUNT > 0
ROLLBACK
RAISERROR (#v_ErrorMessage, 16, 1)
END CATCH
END
This way you always work with dbo.Claims and the records are never lost (just updated or inserted).
If you need to check the last status of a particular claim you can create a view:
CREATE VIEW dbo.vClaimLastStatus
AS
WITH ClaimsOrdered AS
(
SELECT
C.ClaimAutoID,
C.ClaimID,
C.Name,
C.Status,
C.ModifiedDate,
C.CreatedDate,
DateRanking = ROW_NUMBER() OVER (PARTITION BY C.ClaimID ORDER BY C.CreatedDate DESC)
FROM
dbo.Claims AS C
)
SELECT
C.ClaimAutoID,
C.ClaimID,
C.Name,
C.Status,
C.ModifiedDate,
C.CreatedDate,
FROM
ClaimsOrdered AS C
WHERE
DateRanking = 1

Related

Left join with multiple matches on right table with conditions

I am trying to do a SQL Server query where in my left table (database 1 - Db1), I have some Data that need to be checked some times.
I can`t write in this table (Db1, read only) so I have created other database Db2 that when I check the information related to some record in Db1, I save the date of this check at Db2.
Then, I have a list on the left table that shows me all the records that have to be check. When I create the check record in db2, for some time, 3-4 month, the list does not have to show me this record again.
So the query must handle that, if there is not record in db2, if is null, the record must be showed in the list, then if I create one check, for a time (3-4 month) does not have to show this record, when the 3-4 months have passed, the record have to be again in the list to remember to do a new check, then if I create a new check in Db2 with new date, the record at left has to hide again for 3-4 month more and so on.
I have tried this, that works if there is only one record at Db2, but as soon as I create another check, the old record, the first one, does not allow to hide the left record on the list.
I hope some one can give me a clue.
Thank you in advance
SELECT uno.fecha_ini, uno.fecha_res, uno.fecha_fin, uno.n_contrato, uno.n_propied, tres.n_contrato
FROM [GI].[dbo].[alq03] uno
LEFT JOIN [GI_impuestos].[dbo].[checkar] tres ON uno.n_contrato = tres.n_contrato
WHERE tres.n_contrato IS NULL
/*checks*/
AND (tres.fecha_checkar IS NULL)
OR (tres.fecha_checkar NOT BETWEEN (GETDATE() - 120) AND (GETDATE()))
GROUP BY uno.fecha_ini, uno.fecha_res, uno.fecha_fin, uno.n_contrato, uno.n_propied, tres.n_contrato
ORDER BY uno.fecha_fin asc
Here is a simple example of what you are asking for. You will have to modify it to suit your needs.
create table readOnly(
id int primary key,
info varchar(100) not null);
insert into readOnly values
(1,'info to check'),
(2,'more info'),
(3,'not finished yet!');
create table chk (
rid int primary key,
checked date);
insert into chk values
(1,'2021-07-01'),
(2,'2021-12-01')
select
id,
coalesce(cast(checked as char),'never') last_check,
info info_to_check
from
readOnly
left join
chk on id = rid
where
datediff(m,checked,getdate())>4
or checked is null
id | last_check | info_to_check
-: | :----------- | :----------------
1 | 2021-07-01 | info to check
3 | never | not finished yet!
db<>fiddle here

Compare COUNT results before and after operation/load in SQL

I need help for one of my cases.
Lets say that I have one table with only one column named CustomerID with 1500 records.
The CustomerID table is loaded in DB 2 times per day - 10am and 10pm
I want to compare the CustomerID table in the morning (10am) with the one in (10pm)
SELECT COUNT(*) from CustomerID -- 10 AM / 1500 records.
SELECT COUNT(*) from CustomerID -- 10 PM / 1510 records.
I want to check for these 10 extra records - only the count, nothing more.
The main idea is to keep track on the table and if there are no new records in 10 PM - to tell the responsible person that the table is "broken", because the table should be a growing count number with every load.
Thanks!
I did this recently for multiple DBs and table, but can show you how to do it for just one table.
Instructions:
Create a stored procedure using the query below (update w/ your db and table
name)
*You will need to create the table before being able to run this
Put this on a job schedule for 10a and 10p
Check daily or create a visualization/dashboard using this new table as a data
source to display whether everything was loaded as it should have been
Query:
use [YOUR DB NAME]
go
create procedure [YOURSCHEMA.YOUR_NEW_AUDIT_TABLE_NAME] as
insert into [TABLE_NAME_YOU_WANT_TO_CREATE_FOR_TRACKING]
select schema_name(schema_id) as [schemaname],
[tables].name as [tablename],
sum([partitions].[rows]) as [totalrowcount],
getdate() as date_checked
from sys.tables as [tables]
join sys.partitions as [partitions] on [tables].[object_id] = [partitions].[object_id] and [partitions].index_id in ( 0, 1 )
where [tables].name = '[TABLE_NAME_YOU_WANT_TRACKED]'
group by schema_name(schema_id), [tables].name;
go;

Dynamically Updating Columns with new Data

I am handling an SQL table with over 10K+ Values, essentially it controls updating the status of a production station over the day. Currently the SQL server will report a new message at the current time stamp - ergo a new entry can be generated for the same part hundreds of times a day whilst only having the column "Production_Status" and "TimeStamp" changed. I want to create a new table that selects unique part names then have two other columns that control bringing up the LATEST entry for THAT part.
I have currently selected the data - reordered it so the latest timestamp is first on the list. I am currently trying to do this dynamic table but I am new to sql.
select dateTimeStamp,partNumber,lineStatus
from tblPLCData
where lineStatus like '_ Zone %' or lineStatus = 'Production'
order by dateTimeStamp desc;
The Expected results should be a NewTable with the row count being based off how many parts are in our total production facility - this column will be static - then two other columns that will check Originaltable for the latest status and timestamp and update the two other columns in the newTable.
I don't need help with the table creation but more the logic that surrounds the updating of rows based off of another table.
Much Appreciated.
It looks like you could take advantage of a sub join that finds the MAX statusDate for each partNumber, then joins back to itself so that you can get the corresponding lineStatus value that corresponds to the record with the max date. I just have you inserting/updating a temp table but this can be the general approach you could take.
-- New table that might already exist in your db, I am creating one here
declare #NewTable(
partNumber int,
lineStatus varchar(max),
last_update datetime
)
-- To initially set up your table or to update your table later with new part numbers that were not added before
insert into #NewTable
select tpd.partNumber, tpd.lineStatus, tpd.lineStatusdate
from tblPLCData tpd
join (
select partNumber, MAX(lineStatusdate) lineStatusDateMax
from tblPLCData
group by partNumber
) maxStatusDate on tpd.partNumber = maxStatusDate.partNumber
and tpd.lineStatusdate = maxStatusDate.lineStatusDateMax
left join #NewTable nt on tbd.partNumber = nt.partNumber
where tpd.lineStatus like '_ Zone %' or tpd.lineStatus = 'Production' and nt.partNumber is null
-- To update your table whenever you deem it necessary to refresh it. I try to avoid triggers in my dbs
update nt set nt.lineStatus = tpd.lineStatus, nt.lineStatusdate = tpd.lineStatusDate
from tblPLCData tpd
join (
select partNumber, MAX(lineStatusdate) lineStatusDateMax
from tblPLCData
group by partNumber
) maxStatusDate on tpd.partNumber = maxStatusDate.partNumber
and tpd.lineStatusdate = maxStatusDate.lineStatusDateMax
join #NewTable nt on tbd.partNumber = nt.partNumber
where tpd.lineStatus like '_ Zone %' or tpd.lineStatus = 'Production'

Get all missing values between two limits in SQL table column

I am trying to assign ID numbers to records that are being inserted into an SQL Server 2005 database table. Since these records can be deleted, I would like these records to be assigned the first available ID in the table. For example, if I have the table below, I would like the next record to be entered at ID 4 as it is the first available.
| ID | Data |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 5 | ... |
The way that I would prefer this to be done is to build up a list of available ID's via an SQL query. From there, I can do all the checks within the code of my application.
So, in summary, I would like an SQL query that retrieves all available ID's between 1 and 99999 from a specific table column.
First build a table of all N IDs.
declare #allPossibleIds table (id integer)
declare #currentId integer
select #currentId = 1
while #currentId < 1000000
begin
insert into #allPossibleIds
select #currentId
select #currentId = #currentId+1
end
Then, left join that table to your real table. You can select MIN if you want, or you could limit your allPossibleIDs to be less than the max table id
select a.id
from #allPossibleIds a
left outer join YourTable t
on a.id = t.Id
where t.id is null
Don't go for identity,
Let me give you an easy option while i work on a proper one.
Store int from 1-999999 in a table say Insert_sequence.
try to write an Sp for insertion,
You can easly identify the min value that is present in your Insert_sequence and not in
your main table, store this value in a variable and insert the row with ID from variable..
Regards
Ashutosh Arya
You could also loop through the keys. And when you hit an empty one Select it and exit Loop.
DECLARE #intStart INT, #loop bit
SET #intStart = 1
SET #loop = 1
WHILE (#loop = 1)
BEGIN
IF NOT EXISTS(SELECT [Key] FROM [Table] Where [Key] = #intStart)
BEGIN
SELECT #intStart as 'FreeKey'
SET #loop = 0
END
SET #intStart = #intStart + 1
END
GO
From there you can use the key as you please. Setting a #intStop to limit the loop field would be no problem.
Why do you need a table from 1..999999 all information you need is in your source table. Here is a query which give you minimal ID to insert in gaps.
It works for all combinations:
(2,3,4,5) - > 1
(1,2,3,5) - > 4
(1,2,3,4) - > 5
SQLFiddle demo
select min(t1.id)+1 from
(
select id from t
union
select 0
)
t1
left join t as t2 on t1.id=t2.id-1
where t2.id is null
Many people use an auto-incrementing integer or long value for the Primary Key of their tables, and it is often called ID or MyEntityID or something similar. This column, since it's just an auto-incrementing integer, often has nothing to do with the data being stored itself.
These types of "primary keys" are called surrogate keys. They have no meaning. Many people like these types of IDs to be sequential because it is "aesthetically pleasing", but this is a waste of time and resources. The database could care less about which IDs are being used and which are not.
I would highly suggest you forget trying to do this and just leave the ID column auto-increment. You should also create an index on your table that is made up of those (subset of) columns that can uniquely identify each record in the table (and even consider using this index as your primary key index). In rare cases where you would need to use all columns to accomplish that, that is where an auto-incrementing primary key ID is extremely useful—because it may not be performant to create an index over all columns in the table. Even so, the database engine could care less about this ID (e.g. which ones are in use, are not in use, etc.).
Also consider that an integer-based ID has a maximum total of 4.2 BILLION IDs. It is quite unlikely that you'll exhaust the supply of integer-based IDs in any short amount of time, which further bolsters the argument for why this sort of thing is a waste of time and resources.

SQL standard select current records from an audit log question

My memory is failing me. I have a simple audit log table based on a trigger:
ID int (identity, PK)
CustomerID int
Name varchar(255)
Address varchar(255)
AuditDateTime datetime
AuditCode char(1)
It has data like this:
ID CustomerID Name Address AuditDateTime AuditCode
1 123 Bob 123 Internet Way 2009-07-17 13:18:06.353I
2 123 Bob 123 Internet Way 2009-07-17 13:19:02.117D
3 123 Jerry 123 Internet Way 2009-07-17 13:36:03.517I
4 123 Bob 123 My Edited Way 2009-07-17 13:36:08.050U
5 100 Arnold 100 SkyNet Way 2009-07-17 13:36:18.607I
6 100 Nicky 100 Star Way 2009-07-17 13:36:25.920U
7 110 Blondie 110 Another Way 2009-07-17 13:36:42.313I
8 113 Sally 113 Yet another Way 2009-07-17 13:36:57.627I
What would be the efficient select statement be to get all most current records between a start and end time? FYI: I for insert, D for delete, and U for update.
Am I missing anything in the audit table? My next step is to create an audit table that only records changes, yet you can extract the most recent records for the given time frame. For the life of me I cannot find it on any search engine easily. Links would work too. Thanks for the help.
Another (better?) method to keep audit history is to use a 'startDate' and 'endDate' column rather than an auditDateTime and AuditCode column. This is often the approach in tracking Type 2 changes (new versions of a row) in data warehouses.
This lets you more directly select the current rows (WHERE endDate is NULL), and you will not need to treat updates differently than inserts or deletes. You simply have three cases:
Insert: copy the full row along with a start date and NULL end date
Delete: set the End Date of the existing current row (endDate is NULL)
Update: do a Delete then Insert
Your select would simply be:
select * from AuditTable where endDate is NULL
Anyway, here's my query for your existing schema:
declare #from datetime
declare #to datetime
select b.* from (
select
customerId
max(auditdatetime) 'auditDateTime'
from
AuditTable
where
auditcode in ('I', 'U')
and auditdatetime between #from and #to
group by customerId
having
/* rely on "current" being defined as INSERTS > DELETES */
sum(case when auditcode = 'I' then 1 else 0 end) >
sum(case when auditcode = 'D' then 1 else 0 end)
) a
cross apply(
select top 1 customerId, name, address, auditdateTime
from AuditTable
where auditdatetime = a.auditdatetime and customerId = a.customerId
) b
References
A cribsheet for data warehouses, but has a good section on type 2 changes (what you want to track)
MSDN page on data warehousing
Ok, a couple of things for audit log tables.
For most applications, we want audit tables to be extremely quick on insertion.
If the audit log is truly for diagnostic or for very irregular audit reasons, then the quickest insertion criteria is to make the table physically ordered upon insertion time.
And this means to put the audit time as the first column of the clustered index, e.g.
create unique clustered index idx_mytable on mytable(AuditDateTime, ID)
This will allow for extremely efficient select queries upon AuditDateTime O(log n), and O(1) insertions.
If you wish to look up your audit table on a per CustomerID basis, then you will need to compromise.
You may add a nonclustered index upon (CustomerID, AuditDateTime), which will allow for O(log n) lookup of per-customer audit history, however the cost will be the maintenance of that nonclustered index upon insertion - that maintenance will be O(log n) conversely.
However that insertion time penalty may be preferable to the table scan (that is, O(n) time complexity cost) that you will need to pay if you don't have an index on CustomerID and this is a regular query that is performed.
An O(n) lookup which locks the table for the writing process for an irregular query may block up writers, so it is sometimes in writers' interests to be slightly slower if it guarantees that readers aren't going to be blocking their commits, because readers need to table scan because of a lack of a good index to support them....
Addition: if you are looking to restrict to a given timeframe, the most important thing first of all is the index upon AuditDateTime. And make it clustered as you are inserting in AuditDateTime order. This is the biggest thing you can do to make your query efficient from the start.
Next, if you are looking for the most recent update for all CustomerID's within a given timespan, well thereafter a full scan of the data, restricted by insertion date, is required.
You will need to do a subquery upon your audit table, between the range,
select CustomerID, max(AuditDateTime) MaxAuditDateTime
from AuditTrail
where AuditDateTime >= #begin and Audit DateTime <= #end
and then incorporate that into your select query proper, eg.
select AuditTrail.* from AuditTrail
inner join
(select CustomerID, max(AuditDateTime) MaxAuditDateTime
from AuditTrail
where AuditDateTime >= #begin and Audit DateTime <= #end
) filtration
on filtration.CustomerID = AuditTrail.CustomerID and
filtration.AuditDateTime = AuditTrail.AuditDateTime
Another approach is using a sub select
select a.ID
, a.CustomerID
, a.Name
, a.Address
, a.AuditDateTime
, a.AuditCode
from myauditlogtable a,
(select s.id as maxid,max(s.AuditDateTime)
from myauditlogtable as s
group by maxid)
as subq
where subq.maxid=a.id;
start and end time? e.g as in between 1am to 3am
or start and end date time? e.g as in 2009-07-17 13:36 to 2009-07-18 13:36