Pre-Staging Data Solution

Pre-Staging Data Solution - sql

I have been tasked with replacing a costly stored procedure which performs calculations across 10 - 15 tables, some of which contain many millions of rows. The plan is to pre-stage the many computations and store the results in separate tables for speeding reading.
Having quickly created these new tables and inserted all of the necessary pre-staged data as a test case, the execution time of getting the same results is vastly improved, as you would expect.
My question is, what is the best practice for keeping these new separate tables up to date?
A procedure which runs at a specific interval could do it, but there
is a requirement for the data to be live.
A trigger on each table could do it, but that seems very costly, and
could cause slow-downs for everywhere else that uses these tables.
Are there other alternatives?

Have you considered Indexed Views for this? As long as you meet the criteria for creating Indexed Views (no self joins etc) it may well be a good solution.
The downsides of Indexed Views are that when the data in underlying tables is changed (delete, update, insert) then it will have to recalculate the indexed view. This can slow down these types of operations in certain circumstances so you have to be careful. I've put some links to documentation below;
https://www.brentozar.com/archive/2013/11/what-you-can-and-cant-do-with-indexed-views/
https://msdn.microsoft.com/en-GB/library/ms191432.aspx
https://technet.microsoft.com/en-GB/library/ms187864(v=sql.105).aspx

what is the best practice for keeping these new separate tables up to date?
Answer is it depends .Depends on what ..?
1.How frequently you will use those computed values
2.what is the acceptable data latency
we to have same kind of reporting where we store computed values in seperate tables and use them in reports.In our case we run this sps before sending the reports through SQL server agent

Consider using an A/B table solution. Place a generic view on over the _A table version (CREATE VIEW MY_TABLE AS SELECT * FROM MY_TABLE_A). And then you rebuild the _B version, and then switch the view to the _B version (CREATE VIEW MY_TABLE AS SELECT * FROM MY_TABLE_B). It takes twice as much space for processing, but it gives you the opportunity to build your tables without down-time.

Related

How to automatically update a content of table after an insert update or delete

I create a table from a junction between 3 tables.
The issue that I'm facing is that this table becomes outdated once a new modification affects once of the original table used to supply its data.
How should solve this problem? Is a trigger the only solution?

You have several options:
Live with the outdated data and periodically (say once a day or once an hour) update the table.
Use a trigger to update the associated values.
Use a view.
Use an indexed view if possible (see here).
The last option is a way to define a view that SQL Server updates automatically -- what is called a materialized view in other databases. That said, not all queries are supported in materialized views.

"There are two problems in computer science. Cache coherency, and naming things." Tables derived from other tables exemplify both.
SQL Server supports materialized views, if you actually need them. Before you decide you do, do what every good carpenter does: measure twice and cut once. That is, write an ordinary view, and see how fast it is. If it's too slow -- however defined -- find out why. Most of the time, slow views can be sped up using indexes. If you have 100s of millions of rows, you might convert the view to a function that takes an argument applied to the WHERE clause, making it a "parameterized" view.

More efficient to query a table or a view?

I am still trying to wrap my head around exactly how views work and when it is best to use a view vs querying a table directly. Here is my scenario:
All of the underlying data resides in a single table that stores three month's worth of data
The table includes four columns: 'TagName', 'Alarm', 'Timestamp', and 'Value'; 'TagName' and 'Timestamp' are indexed
There is a view of this table (no other tables involved) that shows one week's worth of data and combines 'Alarm' and 'Value' into a single column.
The database used in this scenario is SQL Server.
I only need to retrieve 5 - 30 minutes worth of data. Is more efficient to write a query against the underlying table that combines 'Alarm' and 'Value' into a single column (like the view) and sets the time range dynamically, or query the existing view by passing in a time range? To me, the former seems like the way to go since the latter essentially requires two queries. Furthermore, in the second scenario, the first query (i.e. the view) would load an unnecessary number of values into memory.

If your data in single table is not large then querying a table will be faster than creating a view first and querying it for required data as it will avoid one step.
If data is not much and columns in where clause are properly indexed then generally queries should go to tables directly(which is faster in most cases).
Views should be used when you have very large data in a single table and you need to operate on small subset very frequently. In this case views will fetch the required data only once and will work that thus this will help in minimizing re-execution of time taking search queries (on single table or a join)
Before reaching to a solution, please validate/understand your data,requirement and have a single run with both the approaches and compare the time (I think query on table should be the winner) and then take the decision.
Hope this helps.

In general you should simplify queries as much as possible by using views. This makes management of your application simpler and also helps you avoid repetition of query logic (joins and WHERE clauses). However, views can be ill-suited for a particular query which leads to bad for performance as they might lead to unnecessary operations not needed in your particular query. Views are abused when they introduce unnecessary complexity.

Temp table or permanent tables?

For my company I am redesigning some stored procedures. The original procedures are using lots of permanent tables which are filled during the execution of procedure and at the end, the values are deleted. The number of rows can extend from 100 to 50,000 rows for calculation of aggregations.
My question is, will there be severe performance issues if I replace those tables with temp tables ? Is it feasible to use temp tables ?

It depends on how often your using them, how long the processing takes, and if you are concurrently accessing data from the tables while writing.
If you use a temp table, it won't be sitting around waiting for indexing and caching while it's not in use. So it should save an ever so slight bit of resources there. However, you will incur overhead with the temp tables (i.e. creating and destroying).
I would re-examine how your queries function in the procedures and consider employing more in procedure CURSOR operations instead of loading everything into tables and deleting them.
However, databases are for storing information and retrieving information. I would shy away from using permanent tables for routine temp work and stick with the temp tables.
The overall performance shouldn't have any effect with the use case you specified in your question.
Hope this helps,
Jeffrey Kevin Pry

Yes its certainly feasible, you may want to check to see if the permanent tables have any indexing on them to speed up joins and so on.

I agree with Jeffrey. It always depends.
Since you're using Sql Server 2008 you might have a look at table variables.
They should be lighter than TEMP tables.
I define a User Defined Function which returns a table variable like this:
CREATE FUNCTION .ufd_GetUsers ( #UserCode INT )
RETURNS #UsersTemp TABLE
(
UserCode INT NOT NULL,
RoleCode INT NOT NULL
)
AS
BEGIN
INSERT #RolesTemp
SELECT
dbo.UsersRoles.Code, Roles.Code
FROM
dbo.UsersRoles
INNER JOIN
dbo.UsersRolesRelations ON dbo.UsersRoles.Code = dbo.UsersRolesRelations.UserCode
INNER JOIN
dbo.UsersRoles Roles ON dbo.UsersRolesRelations.RoleCode = Roles.Code
WHERE dbo.UsersRoles.Code = #UserCode
INSERT #UsersTemp VALUES(#UserCode, #UserCode)
RETURN
END

A big question is, can more then one person run one of these stored procedures at a time? I regularly see these kind of tables carried over from old single user databases (or from programmers who couldn't do subqueries or much of anything beyond SELECT * FROM). What happens if more then one user tries to run the same procedure, what happens if it crashes midway through - does the table get cleaned up? With temp tables or table variables you have the ability to properly scope the table to just the current connection.

Definitely use a temporary table, especially since you've alluded to the fact that its purpose is to assist with calculations and aggregates. If you used a table inside one of your database's schemas all that work is going to be logged - written, backed up, and so on. Using a temporary table eliminates that overhead for data that in the end you probably don't care about.

You actually might save some time from the fact that you can drop the temp tables at the end instead of deleting rows (you said you had multiple users so you have to delete rather than truncate). Deleting is a logged operation and can add considerable time to the process. If the permanent tables are indexed, then create the temp tables and index them as well. I would bet you would see an increase in performance usless your temp db is close to out of space.
Table variables also might work but they can't be indexed and they are generally only faster for smaller datasets. So you might try a combination of temp tables for the things taht will be large enough to benfit form indexing and table varaibles for the smaller items.
An advatage of using temp tables and table variables is that you guarantee that one users process won;t interfer with another user's process. You say they currently havea way to identify which records but all it takes is one bug being introduced to break that when using permanent tables. Permanent table for temporary processing are a very risky choice. Temp tables and table variabels can never see the data from someone else's process and thus are far safer as a choice.

Table variables are normally the way to go.
SQL2K and below can have significant performance bottlenecks if there are many temp tables being manipulated - the issue is the blocking DDL on the system tables.
Sql2005 is better, but table vars avoid the whole issue by not using those system tables at all, so can perform without inter-user locking issues (except those involved with the source data).
The issue is then that table vars only persist within scope, so if there is genuinuely a large amount of data that needs to be processed repeatedly & needs to be persisted over a (relatively) long duration then 'static' work tables may actually be faster - it does need a user key of some sort & regular cleaning. A last resort really.

sql: DELETE + INSERT vs UPDATE + INSERT

A similar question has been asked, but since it always depends, I'm asking for my specific situation separately.
I have a web-site page that shows some data that comes from a database, and to generate the data from that database I have to do some fairly complex multiple joins queries.
The data is being updated once a day (nightly).
I would like to pre-generate the data for the said view to speed up the page access.
For that I am creating a table that contains exact data I need.
Question: for my situation, is it reasonable to do complete table wipe followed by insert? or should I do update,insert?
SQL wise seems like DELETE + INSERT will be easier (INSERT part is a single SQL expression).
EDIT: RDBMS: MS SQL Server 2008 Ent

TRUNCATE will be faster than delete, so if you need to empty a table do that instead
You didn't specify your RDBMS vendor but some of them also have MERGE/UPSERT commands This enables you do update the table if the data exists and insert if it doesn't

It partly depends on how the data is accessed. If you have a period of time with no (or very few) users accessing it, then there won't be much impact on the data disappearing (between the DELETE and the completion of the INSERT) for a short while.

Have you considered using a materialized view (MSSQL calls them indexed views) instead of doing it manually? This could also have other performance benefits as an indexed view gives the query optimizer more choices when its constructing execution plans for other queries that reference the table(s) in the view.

It depends on the size of the table and the recovery model on the database. If you are deleting many hundreds of thousands of records and reinstating them vs updating a small batch of a few hundred and inserting tens of rows, it will add an unnecessary size to your transaction logs. However you could use TRUNCATE to get around this as it won't affect the transaction log.
Do you have the option of a MERGE/UPSERT? If you're using MS-SQL you can use CROSS APPLY to do something similar if you don't.

One approach to handling this type of problem is to insert into new table, then do a table Rename. This will insure that all new data is present at the same time.

What if some data that was present yesterdays is not anymore? Delete may be safer or you could end up deleting some records anyway.
And in the end it doesnt really matter which way you go.
Unless on the case #kevinw mentioned

Although I fully agree with SQLMenace's answer I do would like to point out that MERGE does NOT remove unneeded records ! If you're sure that your new data will be a super-set of the existing data, then MERGE is great, otherwise you'll either need to make sure that you delete any superfluous records later on, or use the TRUNCATE + INSERT method ...
(Personally I'm still a fan of the latter as it usually is quite fast, just make sure to drop all indexes/unique constraints upfront and rebuild them one by one. This has the benefit of the INSERT transaction being smaller and the index-adding being done in (smaller) transactions again later on). (**)
(**: yes, this might be tricky on live system, but then again he already mentioned this was done during some kind of overnight anyway, I'm extrapolating there is no user-access at that time)

MySQL Views - When to use & when not to

the mysql certification guide suggests that views can be used for:
creating a summary that may involve calculations
selecting a set of rows with a WHERE clause, hide irrelevant information
result of a join or union
allow for changes made to base table via a view that preserve the schema of original table to accommodate other applications
but from how to implement search for 2 different table data?
And maybe you're right that it doesn't
work since mysql views are not good
friends with indexing. But still. Is
there anything to search for in the
shops table?
i learn that views dont work well with indexing so, will it be a big performance hit, for the convenience it may provide?

A view can be simply thought of as a SQL query stored permanently on the server. Whatever indices the query optimizes to will be used. In that sense, there is no difference between the SQL query or a view. It does not affect performance any more negatively than the actual SQL query. If anything, since it is stored on the server, and does not need to be evaluated at run time, it is actually faster.
It does afford you these additional advantages
reusability
a single source for optimization

This mysql-forum-thread about indexing views gives a lot of insight into what mysql views actually are.
Some key points:
A view is really nothing more than a stored select statement
The data of a view is the data of tables referenced by the View.
creating an index on a view will not work as of the current version
If merge algorithm is used, then indexes of underlying tables will be used.
The underlying indices are not visible, however. DESCRIBE on a view will show no indexed columns.

MySQL views, according to the official MySQL documentation, are stored queries that when invoked produce a result set.
A database view is nothing but a virtual table or logical table (commonly consist of SELECT query with joins). Because a database view is similar to a database table, which consists of rows and columns, so you can query data against it.
Views should be used when:
Simplifying complex queries (like IF ELSE and JOIN or working with triggers and such)
Putting extra layer of security and limit or restrict data access (since views are merely virtual tables, can be set to be read-only to specific set of DB users and restrict INSERT )
Backward compatibility and query reusability
Working with computed columns. Computed columns should NOT be on DB tables, because the DB schema would be a bad design.
Views should not be use when:
associate table(s) is/are tentative or subjected to frequent structure change.

According to http://www.mysqltutorial.org/introduction-sql-views.aspx
A database table should not have calculated columns however a database view should.
I tend to use a view when I need to calculate totals, counts etc.
Hope that help!

One more down side of view that doesn't work well with mysql replicator as well as it is causing the master a bit behind of the slave.
http://bugs.mysql.com/bug.php?id=30998

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pre-Staging Data Solution - sql

Related

How to automatically update a content of table after an insert update or delete

More efficient to query a table or a view?

Temp table or permanent tables?

sql: DELETE + INSERT vs UPDATE + INSERT

MySQL Views - When to use & when not to

Categories

Resources