I have a table person with columns of interest
id, ab_num, is_valid_act
The is_valid_act data is being modified by users many times.
We have been asked to build a service for a team that will only give them the changes that are happening on a is_valid_act column from person table and they can call X times a day.
The initial thought was to create a new table with the id, person_id, ab_num, is_valid_act. Then have a trigger for any time the person.is_valid_act column is modified.
Then the service can get the records where id > then the last id the other team is passing out. And we would manually seed their database.
But we worry about table becoming so big that they will lose performance.
Also, we could use the date based but we feel it like that could be error prone.
Another idea is to create a materialized view which only will get the newest records with Materialized View Log.
Any thought/ideas which way is better to go with?
Thank you
Related
I have got a rather basic question but I could not find a confirmation about it online. When you create a view like the one below
create view report AS
select employee_id
from employees
It will store the data in a virtual table. That's ok. But when you add additional employee ids AFTER you have created the view will they be displayed when you run the view again? Cuz what I need is basically some view that will display the latest records I have added in the tables. Is that possible?
Short answer is Yes, it will update....
Ok, so Views don't quite "store" data, they just present data in a different format or select certain columns from a table to create your own "view" of the data.
If you are just looking to find the most recent employee ids through a view, I would recommend adding a column with a created or modified date field defaulting to the date entered. Then have your table do an Order By the datefield descending and select only top few rows so you only get recent records. The way to do this is slightly different depending on if you are using SQL, Oracle, or MySQL.
I want to design a changelog for a few tables. Lets call it table restaurant. Every time a user modifies the list of restaurants the change should be logged.
Idea 1
My first idea was to create 2 tables. One which contains all the restaurants RESTAURANT_VALUE (restaurantId*, restaurantValueId*, address, phone, ..., username, insertDate). Every time a change is made it creates a new entry. Then a table RESTAURANT (restaurantId*, restaurantValueId) which will link to the current valid restaurantValueId. So one table that holds the current and the previous version.
Idea 2
It starts with 2 tables as well. One of them contains all current restaurants. e.g. RESTAURANT_CURRENT. And a second table which contains all changes RESTAURANT_HISTORY. Therefore both need to have the exactly same columns. Every time a change occurs the values of the 'current' table are copied into the history table, and the new version in the 'current'.
My opinion
Idea 1 doesn't care if columns will ever be added or not, therefore maintenance and adding of columns would be easy. However, I think as the database grows... wouldn't it slow down? Idea 2 has the advantage that the table with the values will never have any 'old' stuff and not get crowded.
Theoretically I think Idea 1 should be the one done
What do you think. Would you go for Idea 1 or another one? Are there any other important practical thoughts I am not aware of?
The approach strongly depends on your needs. Why would you want a history table?
If it's just for auditing purposes, then make a separate restaurant_history table (idea 2) to keep the history aside. If you want to present the history in the application, then go for signle restaurants table with one of below options:
seq_no - record version number incrementing with each update. If you need current data, you must search for highest seq_no for given restaurant_id(s), so optionally use also current marker, allowing straighforward current = true
valid_from, valid_to - where valid_to is NULL for current record
Sometimes there is need to query efficiently which attributes exactly changed. to do this easily you can consider a history table on attribute level: (restaurant_id, attribute, old_value, new_value, change_date, user).
WARNING: This tale of woe contains examples of code smells and poor design decisions, and technical debt.
If you are conversant with SOLID principles, practice TDD and unit test your work, DO NOT READ ON. Unless you want a good giggle at someone's misfortune and gloat in your own awesomeness knowing that you would never leave behind such a monumental pile of crap for your successors.
So, if you're sitting comfortably then I'll begin.
In this app that I have inherited and been supporting and bug fixing for the last 7 months I have been left with a DOOZY of a balls up by a developer that left 6 and a half months ago. Yes, 2 weeks after I started.
Anyway. In this app we have clients, employees and visits tables.
There is also a table called AppNewRef (or something similar) that ... wait for it ... contains the next record ID to use for each of the other tables. So, may contain data such as :-
TypeID Description NextRef
1 Employees 804
2 Clients 1708
3 Visits 56783
When the application creates new rows for Employees, it looks in the AppNewRef table, gets the value, uses that value for the ID, and then updates the NextRef column. Same thing for Clients, and Visits and all the other tables whose NextID to use is stored in here.
Yes, I know, no auto-numbering IDENTITY columns on this database. All under the excuse of "when it was an Access app". These ID's are held in the (VB6) code as longs. So, up to 2 billion 147 million records possible. OK, that seems to work fairly well. (apart from the fact that the app is updating and taking care of locking / updating, etc., and not the database)
So, our users are quite happily creating Employees, Clients, Visits etc. The Visits ID is steady increasing a few dozen at a time. Then the problems happen. Our clients are causing database corruptions while creating batches of visits because the server is beavering away nicely, and the app becomes unresponsive. So they kill the app using task manager instead of being patient and waiting. Granted the app does seem to lock up.
Roll on to earlier this year and developer Tim (real name. No protecting the guilty here) starts to modify the code to do the batch updates in stages, so that the UI remains 'responsive'. Then April comes along, and he's working his notice (you can picture the scene now, can't you ?) and he's beavering away to finish the updates.
End of April, and beginning of May we update some of our clients. Over the next few months we update more and more of them.
Unseen by Tim (real name, remember) and me (who started two weeks before Tim left) and the other new developer that started a week after, the ID's in the visit table start to take huge leaps upwards. By huge, I mean 10000, 20000, 30000 at a time. Sometimes a few hundred thousand.
Here's a graph that illustrates the rapid increase in IDs used.
Roll on November. Customer phones Tech Support and reports that he's getting an error. I look at the error message and ask for the database so I can debug the code. I find that the value is too large for a long. I do some queries, pull the information, drop it into Excel and graph it.
I don't think making the code handle anything longer than a long for the ID's is the right approach, as this app passes that ID into other DLL's and OCX's and breaking the interface on those just seems like a whole world of hurt that I don't want to encounter right now.
One potential idea that I'm investigating is try to modify the ID's so that I can get them down to a lower level. Essentially filling the gaps. Using the ROW_NUMBER function
What I'm thinking of doing is adding a new column to each of the tables that have a Foreign Key reference to these Visit ID's (not a proper foreign key mind, those constraints don't exist in this database). This new column could store the old (current) value of the Visit ID (oh, just to confuse things; on some tables it's called EventID, and on some it's called VisitID).
Then, for each of the other tables that refer to that VisitID, update to the new value.
Ideas ? Suggestions ? Snippets of T-SQL to help all gratefully received.
Option one:
Explicitly constrain all of your foreign key relationships, and set them to be ON UPDATE CASCADE.
This will mean that whenever you change the ID, the foreign keys will automatically be updated.
Then you just run something like this...
WITH
resequenced AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY id) AS newID,
*
FROM
yourTable
)
UPDATE
resequenced
SET
id = newID
I haven't done this in ages, so I forget if it causes problems mid-update by having two records with the same id value. If it does, you could do somethign like this first...
UPDATE yourTable SET id = -id
Option two:
Ensure that none of your foreign key relationships are explicitly defined. If they are, note them donw and remove them.
Then do something like...
CREATE TABLE temp AS
newID INT IDENTITY (1,1),
oldID INT
)
INSERT INTO temp (oldID) SELECT id FROM yourTable
/* Do this once for the table you are re-identifiering */
/* Repeat this for all fact tables holding that ID as a foreign key */
UPDATE
factTable
SET
foreignID = temp.newID
FROM
temp
WHERE
foreignID = temp.oldID
Then re-apply any existing foreign key relationships.
This is a pretty scary option. If you forget to update a table, you just borked your data. But, you can give that temp table a much nicer name and KEEP it.
Good luck. And may the lord have mercy on your soul. And Tim's if you ever meet him in a dark alley.
I would create a numbers table that has just a sequence from 1 to whatever max with an increment of 1 is for long and then change the logic of getting the maxid for visitid and maybe the others doing a right join between the numbers and the visits table. and then you can just look for te min of that number
select min(number) from visits right join numbers on visits.id = numbers.number
That way you get all the gaps filled in without having to change any of the other tables.
but I would just redo the whole database.
say I have 6 tables.
Workstation
Workstation_CL
Location
Location_CL
Features
Features_CL
I am currently using triggers to do inserts into the "_CL" version of each table with an additional field that denotes whether the change was an "UPDATE", "INSERT" or "DELETE".
the workstation table keeps track of the "modified_by" user. if a user updates the location of a "Workstation" object, the "Location" table gets updated as well as the "Workstation" table. the only modification to the Workstation table is the "modified_by" field so that I will know who made the change.
The problem I am having is when I think about pulling an audit report. How will I link records in the "Location_CL" to the ones in the "Workstation_CL" both are populated by separate triggers.
somehow my question portion was erased. sorry about that.
Question: how can I pull some type of unique identifier to have in both the "Workstation_CL" and the "Location_CL" so that I can identify each revision? for instance, when I pull all records from the "Location_CL" and I see all location changes, pulling the username from the "Workstation_CL" that made the location change?
Give each revision a GUID generated by the trigger. Populate a field (RevisionId) in both tables with the value.
You need 2, maybe 3 columns on each audit table.
1) Timestamp, so you know when the changes were made.
2) User changed, so you can track who made the changes - I assume that Location can change independently of Workstation.
3) You might need an identifier for the transaction, too. I THINK you can get an id from the DB, though I'm not sure.
I don't think you can have an effective report without timestamps and users, though, and I don't think you just have the user on one table.
During the trigger event, I was able to exec the following:
SELECT #trans_id=transaction_id FROM sys.dm_tran_current_transaction
which gives me the transaction id for the current operation.
with that, I am able to insert it in to the corresponding _CL table and then perform selects that will match the auto-gen id's.
For my application there are several entity classes, User, Customer, Post, and so on
I'm about to design the database and I want to store the date when the entities were created and updated. This is where it gets tricky. Sure one option is to add created_timestamp and update_timestamp columns for each of the entity tables but that isn't that redudant?
Another possibility could be to create a log table that stores this information, and it could be made to contain keep track of updates for any entity.
Any thoughts? I'm leaning on implementing the latter.
The single-log-table-for-all-tables approach has two main problems that I can think of:
The design of the log table will (probably) constrain the design of all the other tables. Most likely the log table would have one column named TableName and then another column named PKValue (which would store the primary key value for the record you're logging). If some of your tables have compound primary keys (i.e. more than one column), then the design of your log table would have to account for this (probably by having columns like PKValue1, PKValue2 etc.).
If this is a web application of some sort, then the user identity that would be available from a trigger would be the application's account, instead of the ID of the web app user (which is most likely what you really want to store in your CreatedBy field). This would only help you distinguish between records created by your web app code and records created otherwise.
CreatedDate and ModifiedDate columns aren't redundant just because they're defined in each table. I would stick with that approach and put insert and update triggers on each table to populate those columns. If I also needed to record the end-user who made the change, I would skip the triggers and populate the timestamp and user fields from my application code.
I do the latter, with a "log" or "events" table. In my experience, the "updated" timestamp becomes frustrating pretty quick, because a lot of the time you find yourself in a fix where you want not just the very latest update time.
How often will you need to include the created/updated timestamps in your presentation layer? If the answer is anything more than "once in a great great while", I think you would be better served by having those columns in each table.
On a project I worked on a couple of years ago, we implemented triggers which updated what we called an audit table (it stored basic information about the changes being made, one audit table per table). This included modified date (and last modified).
They were only applied to key tables (not joins or reference data tables).
This removed a lot of the normal frustration of having to account for LastCreated & LastModified fields, but introduced the annoyance of keeping the triggers up to date.
In the end the trigger/audit table design worked well and all we had to remember was to remove and reapply the triggers before ETL(!).
It's for a web based CMS I work on. The creation and last updated dates will be displayed on most pages and there will be lists for the last created (and updated) pages. The admin interface will also use this information.