SQL: splitting up WITH statements into read-only tables - sql

I am working on a project in SQLite where several INSERT statements involve recursive WITH statements. These INSERT statements are within triggers that are watching for INSERT statements being called on other databases.
I've found that this leads to the recalculation of lots of values every time the WITH statement is triggered, and rather than introduce logic into the WITH statement to only recalculate values that need to be recalculated (which creates sprawl and is difficult for me to maintain), I'd rather take some of the temporary views defined in the WITH statement and make them permanent tables that cache values. I'd then chain triggers so that an update of table_1 leads to an update of table_2, which leads to an update of table_3, etc.. This makes the code easier to debug and maintain.
The issue is that, in the project, I want to make a clear distinction between tables that are user editable and those that are meant for caching intermediary values. One way to do this is (perhaps) to make the "caching" tables read-only EXCEPT when called via certain triggers. I've seen a few questions about similar issues in other dialects of SQL but nothing pertaining to SQLite and nothing looking to solve this problem, thus the new question.
Any help would be appreciated!

Related

Are there any shortcuts for creating a table variable to match the columns in a view?

I frequently need to create table variables within procedures, to store the top 25 rows from a view for processing.
I need to store these small batches in variables temporarily, because I'm performing numerous operations that modify data within the underlying tables, and some of these operations cause the rows to no longer appear within the view itself based on the view criteria (this is by design).
I need to keep the data around for the entire processing session, and I can't rely on the view itself to remain consistent through the operation.
The problem is, since we're doing this in many instances across multiple databases, if we ever make any changes to the columns in any of our views, the code becomes somewhat bug-prone since we also have to make sure to modify the relevant table types as well - without making any typos or mistakes, or overlooking anything.
So my question is, can we just declare table variables (or table types, if necessary) by just stating "Match the current columns in this view?"
That would make things much easier, since it would automatically keep all relevant table variables in sync with the current layout of the views in question, and eliminate the headache that comes with trying to keep it all straight manually.
If no such shortcut exists, then I guess we'll just have to create custom table types matching our views as needed, to at least keep it as centralized as possible.
If the usage of type variable could be replaced by Temporary Table something like:
SELECT * INTO #TempTable FROM myView
Will do the job perfectly.
With SELECT INTO, the table will be created with colomn and metadata avialable in you select statement.
Hope this helps.

Help updating a column using other columns of the same table

Table: Customer with columns Start_Time and End_Time.
I need to add a new column "Duration" that is End_Time - Start_Time.
However, I need to do this using a trigger or procedure so that immediately after a new record is added to Customer table, the column Duration is updated.
If you are using MS SQL, the ideal answer is probably a computed column.
The less data you actually duplicate, the less opportunity for data inconsistency you will have, therefore the less consistency-ensuring/verification code and fewer maintenance processes will result from your schema.
To set this up, (again, if using MS SQL), just add another column using the designer, and expand the "Computed Column Specification" area. (You can refer to other columns from this same table for this calculation.) Then enter "End_Time - Start_Time". Depending on what you are going to do with this data, may want to use something like DATEDIFF(minute, Start_Time, End_Time) for your formula, instead. It's exactly what this feature is for.
If it is a very expensive calculation (which yours is probably not, from the information you've given) you could configure the results to be "persisted" - that's very much like a trigger but clearer to implement and maintain.
Alternately, you could create a new View that does the same calculation, and "project" this first table through it whenever getting information. But you probably already knew that, thus this answer was born! :)
p.s. I personally recommend avoiding triggers like the plague. They cause extra operations that are often not expected by a developer, maintainer, or admin. This can cause operations to fail, return unexpected extra result sets, or modify rows that perhaps an admin was specifically trying to avoid modifying during an administrative (read: unsupported grin) fix.
p.p.s. In this case I'd also recommend against a stored procedure, for the same maintenance reason as triggers. Although you could restrict security such that the only way to update the table was through a stored procedure, this can fail for many of the same reasons triggers can fail. Best to avoid duplicating the data if you can.
p.p.p.s :) This is not to say stored procedures are bad as a whole. On complex transactional operations or tightly integrated procedural filtering of large related tables in order to return a comparatively small result set they are still often the best choice.
As per shannon, though the the term in oracle is a "Virtual Column"
There were an 11g enhancement. Prior to that, use a view (and that is still a potential answer for 11g).
Do not use a trigger or stored procedure.

sql: DELETE + INSERT vs UPDATE + INSERT

A similar question has been asked, but since it always depends, I'm asking for my specific situation separately.
I have a web-site page that shows some data that comes from a database, and to generate the data from that database I have to do some fairly complex multiple joins queries.
The data is being updated once a day (nightly).
I would like to pre-generate the data for the said view to speed up the page access.
For that I am creating a table that contains exact data I need.
Question: for my situation, is it reasonable to do complete table wipe followed by insert? or should I do update,insert?
SQL wise seems like DELETE + INSERT will be easier (INSERT part is a single SQL expression).
EDIT: RDBMS: MS SQL Server 2008 Ent
TRUNCATE will be faster than delete, so if you need to empty a table do that instead
You didn't specify your RDBMS vendor but some of them also have MERGE/UPSERT commands This enables you do update the table if the data exists and insert if it doesn't
It partly depends on how the data is accessed. If you have a period of time with no (or very few) users accessing it, then there won't be much impact on the data disappearing (between the DELETE and the completion of the INSERT) for a short while.
Have you considered using a materialized view (MSSQL calls them indexed views) instead of doing it manually? This could also have other performance benefits as an indexed view gives the query optimizer more choices when its constructing execution plans for other queries that reference the table(s) in the view.
It depends on the size of the table and the recovery model on the database. If you are deleting many hundreds of thousands of records and reinstating them vs updating a small batch of a few hundred and inserting tens of rows, it will add an unnecessary size to your transaction logs. However you could use TRUNCATE to get around this as it won't affect the transaction log.
Do you have the option of a MERGE/UPSERT? If you're using MS-SQL you can use CROSS APPLY to do something similar if you don't.
One approach to handling this type of problem is to insert into new table, then do a table Rename. This will insure that all new data is present at the same time.
What if some data that was present yesterdays is not anymore? Delete may be safer or you could end up deleting some records anyway.
And in the end it doesnt really matter which way you go.
Unless on the case #kevinw mentioned
Although I fully agree with SQLMenace's answer I do would like to point out that MERGE does NOT remove unneeded records ! If you're sure that your new data will be a super-set of the existing data, then MERGE is great, otherwise you'll either need to make sure that you delete any superfluous records later on, or use the TRUNCATE + INSERT method ...
(Personally I'm still a fan of the latter as it usually is quite fast, just make sure to drop all indexes/unique constraints upfront and rebuild them one by one. This has the benefit of the INSERT transaction being smaller and the index-adding being done in (smaller) transactions again later on). (**)
(**: yes, this might be tricky on live system, but then again he already mentioned this was done during some kind of overnight anyway, I'm extrapolating there is no user-access at that time)

Entity Framework: equivalent of INSERT OR IGNORE

I'm using a database as a kind of cache - the purpose is to store currently known "facts" (the details aren't that interesting). Crucially, however, I'd like to avoid inserting duplicate facts. To do so, the current data access code uses insert or ignore in many places.
I'd like to use the entity framework to benefit from it's manageability; I've had enough of writing boilerplate code for yet another on subtly different query, and so far the Entity Framework's support for queries looks great.
However, I can't really seem to find good support for adding new facts - the obvious way seems to be to simply construct new Entities in .NET, add them to the model, and save the changes to the database. However, this means that I need to check in advance whether a particular row already exists since if it does insertion is not permitted. In the best of times, that means extra roundtrips, but it's usually much worse: previously this code was highly parallel - the occasional duplicate insert wouldn't matter since the database would just ignore it - but now not only would I need to check whether a row exists beforehand, I'd need to do so in far smaller batches (which is typically many times slower due to transaction overhead) since just one already-existing row from a parallel insert will cause a rollback of the entire transaction.
So, does anyone know how to get similar behavior using the entity framework?
To be specific, I'd like to be able to perform several inserts and have the database simply ignore those inserts that are impossible - or more generally, to be able to express server-side conflict resolution code to avoid unnecessary roundtrips and transactions.
If it matters, I'm using SQLite.
This functionality can be implemented via stored procedures (which the entity framework can use) on DB's that support it (MSSQL, for instance). On SQLite, these aren't supported. However, a simple solution exists for sqlite as well, namely to use triggers such as the following:
CREATE TRIGGER IF NOT EXISTS MyIgnoreTrigger BEFORE INSERT ON TheTable
FOR EACH ROW BEGIN
INSERT OR IGNORE
INTO TheTable (col1, col2, col3)
VALUES (NEW.col1, NEW.col2, NEW.col3);
select RAISE(IGNORE);
END;
Alternatively, the trigger could be set up to directly verify the constraint and only raise "ignore" when the constraint is violated.

Date ranges in views - is this normal?

I recently started working at a company with an enormous "enterprisey" application. At my last job, I designed the database, but here we have a whole Database Architecture department that I'm not part of.
One of the stranger things in their database is that they have a bunch of views which, instead of having the user provide the date ranges they want to see, join with a (global temporary) table "TMP_PARM_RANG" with a start and end date. Every time the main app starts processing a request, the first thing it does it "DELETE FROM TMP_PARM_RANG;" then an insert into it.
This seems like a bizarre way of doing things, and not very safe, but everybody else here seems ok with it. Is this normal, or is my uneasiness valid?
Update I should mention that they use transactions and per-client locks, so it is guarded against most concurrency problems. Also, there are literally dozens if not hundreds of views that all depend on TMP_PARM_RANG.
Do I understand this correctly?
There is a view like this:
SELECT * FROM some_table, tmp_parm_rang
WHERE some_table.date_column BETWEEN tmp_parm_rang.start_date AND tmp_parm_rang.end_date;
Then in some frontend a user inputs a date range, and the application does the following:
Deletes all existing rows from
TMP_PARM_RANG
Inserts a new row into
TMP_PARM_RANG with the user's values
Selects all rows from the view
I wonder if the changes to TMP_PARM_RANG are committed or rolled back, and if so when? Is it a temporary table or a normal table? Basically, depending on the answers to these questions, the process may not be safe for multiple users to execute in parallel. One hopes that if this were the case they would have already discovered that and addressed it, but who knows?
Even if it is done in a thread-safe way, making changes to the database for simple query operations doesn't make a lot of sense. These DELETEs and INSERTs are generating redo/undo (or whatever the equivalent is in a non-Oracle database) which is completely unnecessary.
A simple and more normal way of accomplishing the same goal would be to execute this query, binding the user's inputs to the query parameters:
SELECT * FROM some_table WHERE some_table.date_column BETWEEN ? AND ?;
If the database is oracle, it's possibly a global temporary table; every session sees its own version of the table and inserts/deletes won't affect other users.
There must be some business reason for this table. I've seen views with dates hardcoded that were actually a partioned view and they were using dates as the partioning field. I've also seen joining on a table like when dealing with daylights saving times imagine a view that returned all activity which occured during DST. And none of these things would ever delete and insert into the table...that's just odd
So either there is a deeper reason for this that needs to be dug out, or it's just something that at the time seemed like a good idea but why it was done that way has been lost as tribal knowledge.
Personally, I'm guessing that it would be a pretty strange occurance. And from what you are saying two methods calling the process at the same time could be very interesting.
Typically date ranges are done as filters on a view, and not driven by outside values stored in other tables.
The only justification I could see for this is if there was a multi-step process, that was only executed once at a time and the dates are needed for multiple operations, across multiple stored procedures.
I suppose it would let them support multiple ranges. For example, they can return all dates between 1/1/2008 and 1/1/2009 AND 1/1/2006 and 1/1/2007 to compare 2006 data to 2008 data. You couldn't do that with a single pair of bound parameters. Also, I don't know how Oracle does it's query plan caching for views, but perhaps it has something to do with that? With the date columns being checked as part of the view the server could cache a plan that always assumes the dates will be checked.
Just throwing out some guesses here :)
Also, you wrote:
I should mention that they use
transactions and per-client locks, so
it is guarded against most concurrency
problems.
While that may guard against data consistency problems due to concurrency, it hurts when it comes to performance problems due to concurrency.
Do they also add one -in the application- to generate the next unique value for the primary key?
It seems that the concept of shared state eludes these folks, or the reason for the shared state eludes us.
That sounds like a pretty weird algorithm to me. I wonder how it handles concurrency - is it wrapped in a transaction?
Sounds to me like someone just wasn't sure how to write their WHERE clause.
The views are probably used as temp tables. In SQL Server we can use a table variable or a temp table (# / ##) for this purpose. Although creating views are not recommended by experts, I have created lots of them for my SSRS projects because the tables I am working on do not reference one another (NO FK's, seriously!). I have to workaround deficiencies in the database design; that's why I am using views a lot.
With the global temporary table GTT approach that you comment is being used here, the method is certainly safe with regard to a multiuser system, so no problem there. If this is Oracle then I'd want to check that the system either is using an appropriate level of dynamic sampling so that the GTT is joined appropriately, or that a call to DBMS_STATS is made to supply statistics on the GTT.