How to add dates to database records for trending analysis - sql

I have a SQL server database table that contain a few thousand records. These records are populated by PowerShell scripts on a weekly basis. These scripts basically overwrite last weeks data so the table only has information pertaining to the previous week. I would like to be able to take a copy of that tables data each week and add a date column with that day's date beside each record. I need this so can can do trend analysis in the future.
Unfortunately, I don't have access to the PowerShell scripts to edit them. Is there any way I can accomplish this using MS SQL server or some other way?

You can do the following. Create a table that will contain the clone + dates. Insert the results from your original table along with the date into your clone table. From your description you don't need a where clause because the results of the original table are wiped out only holding new data. After the initial table creation there is no need to do it again. You'll just simply do the insert piece. Obviously the below is very basic and is just to provide you the framework.
CREATE TABLE yourTableClone
(
col1 int
col2 varchar(5)...
col5 date
)
insert into yourTableClone
select *, getdate()
from yourOriginalTable

Related

creating materialized view for annual report based on slow function

Consider the following scenario:
I have a table with 1 million product ids products :
create table products (
pid number,
p_description varchar2(200)
)
also there is a relatively slow function
function gerProductMetrics(pid,date) return number
which returns some metric for the given product at given date.
there is also an annual report executed every year that is based on the following query:
select pid,p_description,getProductMetrics(pid,'2019-12-31') from
products
that query takes about 20-40 minutes to execute for a given year.
would it be correct approach to create Materialized View (MV) for this scenario using the following
CREATE TABLE mydates
(
mydate date
);
INSERT INTO mydates (mydate)
VALUES (DATE '2019-12-31');
INSERT INTO mydates (mydate)
VALUES (DATE '2018-12-31');
INSERT INTO mydates (mydate)
VALUES (DATE '2017-12-31');
CREATE MATERIALIZED VIEW metrics_summary
BUILD IMMEDIATE
REFRESH FORCE ON DEMAND
AS
SELECT pid,
getProductMetrics(pid,mydate AS annual_metric,
mydate
FROM products,mydates
or it would take forever?
Also, how and how often would I update this MV?
Metrics data is required for the end of each year.
But any year's data could be requested at any time.
Note, that I have no control over the slow function - it's just a given.
thanks.
First, you do not have a "group by" query, so you can remove that.
An MV would be most useful if you needed to recompute all of the data for all years. As this appears to be a summary, with no need to reprocess old data, updated only when certain threshold dates like end of year are passed, I would recommend putting the results in a normal table and only adding the updates as often as your threshold dates occur (annually?) using a stored procedure. Otherwise your MV will take longer to run and require more system resources with every execution that adds a new date.
Do not create a materialized view. This is not just a performance issue. It is also an archiving issue: You don't want to run the risk that historical results could change.
My advice is to create a single table with a "year" column. Run the query once per year and insert the rows into the new table. This is an archive of the results.
Note: If you want to recalculate previous years because the results may have changed (say the data is updated somehow), then you should store those results in a separate table and decide which version is the "right" version. You may find that you want an archive table with both the "as-of" date and the "run-date" to see how results might be changing.

How to take daily snapshot of table in SQL Server?

One of the tables has to be a hierarchy of product reps and their assigned area. Orese reps and their area change every day, and I need to keep track of what exactly that table looks like every day. I will need to take snapshots of the table daily. I would like to know what I have to do or how I have to store the data in the table, to be able to know exactly what the data in the table was at a certain point in time. Is this possible? Please keep in mind that the table will not be more than one megabyte and table has an incremental load. i do not want to use any tool for it. i want to build logic for it in stored proc only.
You can do one of these:
Create a new table each day, and copy the data of your table in it;
Create one new table with the same structure as your table, plus one additional date column, to store the date of the snapshot taken, then each day copy your table along with the current system date;
Make your existing table a temporal table (as also suggested by sticky bit in the comments). Please note, that you need SQL Server 2016 or newer for this.
My personal preference is the last option, but first two may be easier for you.
For the first 2 options you need to create a SQL Server Agent job to run nightly and take the snapshots. The 3rd option works automatically.
Lets say your table is named MyTable and has primary key ID int and field Name varchar(50).
For the first option you need to use dynamic SQL, because each time your new table's name will be different:
declare #sql nvarchar(max) = N'select ID, Name into MyTable_' +
convert(nvarchar(10), getdate(), 112) + N' from MyTable'
exec (#sql)
When executed, this statement will create a new table with the same structure as your existing table, but named with the current date as suffix, e.g. MyTable_20190116, and copy MyTable to it.
For the second option you need to create one table like bellow, and copy data to it using the script like this:
create table MyTableDailySnapshots(
SnapshotDate date not null
, ID int not null
, Name varchar(50)
, constraint PK_MyTableDailySnapshots primary key clustered (SnapshotDate, ID)
)
insert into MyTableDailySnapshots(SnapshotDate, ID, Name)
select GETDATE(), ID, Name
from MyTable
If you choose the third option, no actions are needed to maintain the snapshots. Just use query like this, to get the state of the table for a period of time:
select ID, Name from MyTable
for system_time between '2019-01-16 00:00:00.0000000' and '2019-01-16 23:59:59.9999999'
The first option is more flexible if you table's schema changes in time, because each day you can create a table with different schema. Options 2 and 3 has only 1 table to store the snapshots, so you may need to be creative, if your table's schema needs to change. But the disadvantage of the first option is the large number of tables created in your database.
So it is up to you to choose what's the best for your case.

Extract rows which haven't been extracted already using SQL Server stored procedure

I have a table Customers. I'm trying to design a way which will extract data from Customers table daily and create a CSV of this data. I want to pick only those records which haven't been extracted yet. How can I keep track of whether it has been extracted or not? I cannot alter the Customers table to add a flag.
So far I'm planning to use a Stage table which will have this flag. So I'm writing a stored procedure to get the data from the Customers table and have the flag set to 0 for each of these records. And use SSIS to create the CSV after pulling this data from stage table and once the records have been extracted into CSV update the staging table with flag=1 for those records.
What is a good design for this problem?
Customer table:
CustomerID | Name | RecordCreated | RecordUpdated
Create another table tblExportedEmpID with a column CustomerID. Add the customer id of each customer extracted from the Customer table into that new table. And to extract the customer from the Customer table which are not extracted yet, you can use this query :
select * from customer where customerid not in(select customerid from tblExportedEmpID)
You have RecordCreated and RecordUpdated. Why even bother with a separate record-per table if you have that information?
You'll need to create a table or equivalent "saved until next run" data area. The first thing you have your script do is grab the current time, and whatever was stored in that data area. Then, have your statement query everything:
SELECT <list of columns and transformation>
FROM Customers
WHERE recordCreated >= :lastRunTime AND recordCreated < :currentRunTime
(or recordUpdated, if you need to re-extract if the customer's name changes)
Note that you want the exclusive upper-bound (<) to cover the case where your stored timestamp has less resolution than the mechanism getting the timestamp.
For the last step, store off your run start - whatever the script grabbed for "current time" - into the "saved until next run" data area.

Sql Server 2008 partition table based on insert date

My question is about table partitioning in SQL Server 2008.
I have a program that loads data into a table every 10 mins or so. Approx 40 million rows per day.
The data is bcp'ed into the table and needs to be able to be loaded very quickly.
I would like to partition this table based on the date the data is inserted into the table. Each partition would contain the data loaded in one particular day.
The table should hold the last 50 days of data, so every night I need to drop any partitions older than 50 days.
I would like to have a process that aggregates data loaded into the current partition every hour into some aggregation tables. The summary will only ever run on the latest partition (since all other partitions will already be summarised) so it is important it is partitioned on insert_date.
Generally when querying the data, the insert date is specified (or multiple insert dates). The detailed data is queried by drilling down from the summarised data and as this is summarised based on insert date, the insert date is always specified when querying the detailed data in the partitioned table.
Can I create a default column in the table "Insert_date" that gets a value of Getdate() and then partition on this somehow?
OR
I can create a column in the table "insert_date" and put a hard coded value of today's date.
What would the partition function look like?
Would seperate tables and a partitioned view be better suited?
I have tried both, and even though I think partition tables are cooler. But after trying to teach how to maintain the code afterwards it just wasten't justified. In that scenario we used a hard coded field date field that was in the insert statement.
Now I use different tables ( 31 days / 31 tables ) + aggrigation table and there is an ugly union all query that joins togeather the monthly data.
Advantage. Super timple sql, and simple c# code for bcp and nobody has complained about complexity.
But if you have the infrastructure and a gaggle of .net / sql gurus I would choose the partitioning strategy.

Sql server issue dealing with huge volume of data

i have an requirment like this i need to delete all the customer who have not done transaaction for the past 800 days
i have an table customer where customerID is the primary key
*creditcard table have columns customerID,CreditcardID, where creditcard is an primary key*
Transcation table having column transactiondatetime, CreditcardID,CreditcardTransactionID here is the primarary key in this table.
All the transcationtable data is in the view called CreditcardTransaction so i am using the view to get the information
i have written an query to get the creditcard who have done transaction for the past 800 days and get their CreditcardID and store it in table
as the volume of data in CreditcardTransaction view is around 60 millon data the query what i have writeen fails and logs an message log file is full and throws message system out of memory exception.
INSERT INTO Tempcard
SELECT CreditcardID,transactiondatetime
FROM CreditcardTransaction WHERE
DATEDIFF(DAY ,CreditcardTransaction.transactiondatetime ,getdate())>600
As i need to get the CreditcardID when was their last Transactiondatetime
Need to show their data in an Excel sheet so, i am dumping in data in an Table then insert them into Excel.
what is teh best solution i show go ahead here
i am using an SSIS package(vs 2008 R2) where i call an SP dump data into Table then do few business logic finally insert data in to excel sheet.
Thanks
prince
One thought: Using a function in a Where clause can slow things down - considerably. Consider adding a column named IdleTransactionDays. This will allow you to use the DateDiff function in a Select clause. Later, you can query the Tempcard table to return the records with IdleTransactionDays greater than 600 - similar to this:
declare #DMinus600 datetime =
INSERT INTO Tempcard
(CreditcardID,transactiondatetime,IdleTransactionDays)
SELECT CreditcardID,transactiondatetime, DATEDIFF(DAY ,CreditcardTransaction.transactiondatetime ,getdate())
FROM CreditcardTransaction
Select * From Tempcard
Where IdleTransactionDays>600
Hope this helps,
Andy
Currently you're inserting those records row by row. You could create a SSIS package that reads your data with an OLEDB Source component, performs the necessary operations and bulk inserts them (a minimally logged operation) into your destination table.
You could also directly output your rows into an Excel file. Writing rows to an intermediate table decreases performance.
If your source query still times out, investigate if any indexes exist and that they are not too fragmented.
You could also partition your source data by year (based on transactiondatetime). This way the data will be loaded in bursts.