I am seeing if there is an efficient/fast way of finding all tables/columns updated from a specific process. Basically we want to know all SQL columns that are updated from a frontend ERP process.
I know of two ways - either enable Change Tracking on every single table which is not very efficient, or spin up a blank test environment, perform the process and do row counts on all tables then go and view the data.
Does anyone else have a better method than the two described above?
Related
I have a SSRS Sales report that will be run many times a day by users, but with different parameters selected for the branch and product types.
The SQL query uses some large tables and is quite complex, therefore, running it many times is going to have a performance cost.
I assumed the best solution would be to create a dataset for the report with all permutations, ran once overnight and then apply filters when the users run the report.
I tried creating a snapshot in SSRS which doesn’t consider the parameters and therefore has all the required data, then filtering the Tablix using the parameters that the users selected. The snapshot works fine but it appears to be refreshed when the report is run with different parameters.
My next solution would be to create a table for the dataset which the report would then point to. I could recreate the table every night using a stored procedure. With a couple of small indexes the report would be lightning fast.
This solution would seem to work really well but my knowledge of SQL is limited, and I can’t help thinking this is not the right solution.
Is this suitable? Are there better ways? Can anybody confirm either way?
SSRS datasets have caching capabilities. I think you'll find this more useful instead of having to create extra db tables and such.
Please see here https://learn.microsoft.com/en-us/sql/reporting-services/report-server/cache-shared-datasets-ssrs?view=sql-server-ver15
If the rate of change of the data is low enough, and SSRS Caching doesn't suit your needs, then you could manually cache the record set from the report query (without the filtering) into its own table, then you can modify the report to query from that table.
Oracle and most Data Warehouse implementations have a formal mechanism specifically for this called Materialized Views, no such luck in SQL server though you can easily implement the same pattern yourself.
There are 2 significant drawbacks to this:
The data in the new table is a snapshot at the point in time that it was loaded, so this technique is better suited to slow moving datasets or reports where it is not critical that the data is 100% accurate.
You will need to manage the lifecycle of the data in this table, ideally you should setup a Job or Scheduled Task to automate this refresh but you could trigger a refresh as part of the logic in your report (not recommended, but possible).
Though it is possible, you would NOT consider using a TRIGGER to update the data as you have already indicated the query takes some time to execute, this could have a major impact on the rest of your LOB application
If you do go down this path you should write the refresh logic into a stored procedure so that it can be executed when needed and from other internal and external automation mechanisms.
You should also add a column that records the date and time of when the dataset was executed, then replace any references in your report that display the date and time the report was printed, with the time the data was prepared.
It is also worth pointing out that often performance issues with expensive queries in SSRS reports can be overcome if you can reducing the functions and value formatting that is in the SQL query itself and move that logic into the report definition. This goes for filtering operations too, you can easily add additional computed columns in the dataset definition or in the design surface and you can implement filtering directly in the tablix too, there is no requirement that every record from the SQL query be displayed in the report at all, just as we do not need to show every column.
Sometimes some well crafted indexes can help too, for complicated reports we can often find a balance between what the SQL engine can do efficiently and what the RDL can do for us.
Disclaimer: This is hypothetical advice, you should evaluate each report on a case by case basis.
I would like to know if there is an inherent flaw with the following way of using a database...
I want to create a reporting system with a web front end, whereby I query a database for the relevant data, and send the results of the query to a new data table using "SELECT INTO". Then the program would make a query from that table to show a "page" of the report. This has the advantage that if there is a lot of data, this can be presented a little at a time to the user as pages. The same data table can be accessed over and over while the user requests different pages of the report. When the web session ends, the tables can be dropped.
I am prepared to program around issues such as tracking the tables and ensuring they are dropped when needed.
I have a vague concern that over a long period of time, the database itself might have some form of maintenance problems, due to having created and dropped so many tables over time. Even day by day, lets say perhaps 1000 such tables are created and dropped.
Does anyone see any cause for concern?
Thanks for any suggestions/concerns.
Before you start implementing your solution consider using SSAS or simply SQL Server with a good model and properly indexed tables. SQL Server, IIS and the OS all perform caching operations that will be hard to beat.
The cause for concern is that you're trying to write code that will try and outperform SQL Server and IIS... This is a classic example of premature optimization. Thousands and thousands of programmer hours have been spent on making sure that SQL Server and IIS are as fast and efficient as possible and it's not likely that your strategy will get better performance.
First of all: +1 to #Paul Sasik's answer.
Now, to answer your question (if you still want to go with your approach).
Possible causes of concern if you use VARBINARY(MAX) columns (from the MSDN)
If you drop a table that contains a VARBINARY(MAX) column with the
FILESTREAM attribute, any data stored in the file system will not be
removed.
If you do decide to go with your approach, I would use global temporary tables. They should get DROPped automatically when there are no more connections using them, but you can still DROP them explicitly.
In your query you can check if they exist or not and create them if they don't exist (any longer).
IF OBJECT_ID('mydb..##temp') IS NULL
-- create temp table and perform your query
this way, you have most of the logic to perform your queries and manage the temporary tables together, which should make it more maintainable. Plus they're built to be created and dropped, so it's quite safe to think SQL Server would not be impacted in any way by creating and dropping a lot of them.
1000 per day should not be a concern if you talk about small tables.
I don't know sql-server, but in Oracle you have the concept of temporary table(small article and another) . The data inserted in this type of table is available only on the current session. when the session ends, the data "disapear". In this case you don't need to drop anything. Every user insert in the same table, and his data is not visible to others. Advantage: less maintenance.
You may check if you have something simmilar in sql-server.
I have a database with 50 tables and I want to log users requests, such as inserts, updates or deletes on all the tables in the database. I can also create a trigger for this for each request type.
What is the best way to do this from a performance perspective or is there a better way to track this?
You can also create audit tables which are populated by triggers (and which allow much more flexibility than change data capture). The critical component is to capture sets of data not try to work row-by-row. It does add some overhead yes, but if you write the triggers correctly, it isn't that much. Be sure to capture who (including which application if you have multiple applications hitting the database) and when as well as the old and new values. Set up one audit table per table you want audited (too much locking if you use only one audit table). And at the time you set up your system, write the code to get data back from a bad transaction or set of transactions. That makes it easier to recover when you do have something go wrong and you need to revert. We use two tables per table audited, one contains the info about the process that did the changes (name of the application, date, user, etc. and an auditid), the other contains the details about what was changed (old and new values, ID of the record being affected and column affected). Our structure enables us to use the same structure for each table being audited, and allows the tables to change without having to change the audit table and allows us to easily script the audit tables for a new tables. It is also easy for us to see what records were changed at the same time or in the same process or to find out which of the many applications which touch our database was responsible for the bad data as well as telling us who in particular was responsible for the bad data. This helps us track down application bugs and find out why the data was changed the way it was in some cases. It also makes it easier for us to track down all the data that was affected by a broken process rather than just the one we knew about.
If you have Enterprise Edition, look into Change Data Capture. If you don't have Enterprise and aren't interested in capturing the historical values of the columns that change, look into Change Tracking.
See Comparing Change Data Capture and Change Tracking to understand the differences between the two.
Assuming all requests to insert, update and/or delete data goes through some middle-tier data access layer, I would suggest you do your logging there. This is where we do all of ours. It is much simpler than trying to extract the actual insert / delete / update statements out of SQL Server.
If you want to do auditing of data, you can look into Change Data Capture (CDC). But this requires the Enterprise Edition.
I wan't sure how to word this question so I'll try and explain. I have a third-party database on SQL Server 2005. I have another SQL Server 2008, which I want to "publish" some of the data in the third-party database too. This database I shall then use as the back-end for a portal and reporting services - it shall be the data warehouse.
On the destination server I want store the data in different table structures to that in the third-party db. Some tables I want to denormalize and there are lots of columns that aren't necessary. I'll also need to add additional fields to some of the tables which I'll need to update based on data stored in the same rows. For example, there are varchar fields that contain info I'll want to populate other columns with. All of this should cleanse the data and make it easier to report on.
I can write the query(s) to get all the info I want in a particular destination table. However, I want to be able to keep it up-to-date with the source on the other server. It doesn't have to be updated immediately (although that would be good) but I'd like for it be updated perhaps every 10 minutes. There are 100's of thousands of rows of data but the changes to the data and addition of new rows etc. isn't huge.
I've had a look around but I'm still not sure the best way to achieve this. As far as I can tell replication won't do what I need. I could manually write the t-sql to do the updates perhaps using the Merge statement and then schedule it as a job with sql server agent. I've also been having a look at SSIS and that looks to be geared at the ETL kind of thing.
I'm just not sure what to use to achieve this and I was hoping to get some advice on how one should go about doing this kind-of thing? Any suggestions would be greatly appreciated.
For that tables whose schemas/realtions are not changing, I would still strongly recommend Replication.
For the tables whose data and/or relations are changing significantly, then I would recommend that you develop a Service Broker implementation to handle that. The hi-level approach with service broker (SB) is:
Table-->Trigger-->SB.Service >====> SB.Queue-->StoredProc(activated)-->Table(s)
I would not recommend SSIS for this, unless you wanted to go to something like dialy exports/imports. It's fine for that kind of thing, but IMHO far too kludgey and cumbersome for either continuous or short-period incremental data distribution.
Nick, I have gone the SSIS route myself. I have jobs that run every 15 minutes that are based in SSIS and do the exact thing you are trying to do. We have a huge relational database and then we wanted to do complicated reporting on top of it using a product called Tableau. We quickly discovered that our relational model wasn't really so hot for that so I built a cube over it with SSAS and that cube is updated and processed every 15 minutes.
Yes SSIS does give the aura of being mainly for straight ETL jobs but I have found that it can be used for simple quick jobs like this as well.
I think, staging and partitioning will be too much for your case. I am implementing the same thing in SSIS now but with a frequency of 1 hour as I need to give some time for support activities. I am sure that using SSIS is a good way of doing it.
During the design, I had thought of another way to achieve custom replication, by customizing the Change Data Capture (CDC) process. This way you can get near real time replication, but is a tricky thing.
I'm currently performing a migration operation from a legacy database. I need to perform migration of millions of originating rows, breaking the original content apart into multiple destination parent / child rows.
As it's not a simple 1 to 1 migration and the the resulting rows are parent / children row based on identity generated keys, what's the best mechanism for performing the migration?
I'm assuming that I can't use bulk insert as the identity values for the child rows cannot be determined at the point of generating the script content? The only solution I can currently think of is to set the identity explicitly and then have a predetermined starting point for the import.
If anyone else has any input I'd appreciate the feedback.
This is my standard approach:
create your new data model
pull the data into the new DB unchanged
write (and run) a SQL script to perform the migration
test
(optional) drop the tables with the legacy data
You can get a long way towards migrating the data with plain SQL. For the case you described, you might not need to deal with a single Cursor to get it across.
Running the process in Query Analyzer (or an analog in your dbms), you'll have the advantage that you can wrap everything in a Transaction so that you can roll back if anything goes wacky along the way. Write it in little bits and test it in chunks, on your dev database. Once everything is working correctly, set the script loose on your production database.
Sorted.
Thanks for the suggestion but I'd prefer to produce a programmatic solution. I'm currently using Nant / CruiseControl to automate the tests and need something I can recreate on the fly based on the current live legacy content.