creating materialized view for annual report based on slow function - sql

Consider the following scenario:
I have a table with 1 million product ids products :
create table products (
pid number,
p_description varchar2(200)
also there is a relatively slow function
function gerProductMetrics(pid,date) return number
which returns some metric for the given product at given date.
there is also an annual report executed every year that is based on the following query:
select pid,p_description,getProductMetrics(pid,'2019-12-31') from
that query takes about 20-40 minutes to execute for a given year.
would it be correct approach to create Materialized View (MV) for this scenario using the following
mydate date
INSERT INTO mydates (mydate)
VALUES (DATE '2019-12-31');
INSERT INTO mydates (mydate)
VALUES (DATE '2018-12-31');
INSERT INTO mydates (mydate)
VALUES (DATE '2017-12-31');
getProductMetrics(pid,mydate AS annual_metric,
FROM products,mydates
or it would take forever?
Also, how and how often would I update this MV?
Metrics data is required for the end of each year.
But any year's data could be requested at any time.
Note, that I have no control over the slow function - it's just a given.

First, you do not have a "group by" query, so you can remove that.
An MV would be most useful if you needed to recompute all of the data for all years. As this appears to be a summary, with no need to reprocess old data, updated only when certain threshold dates like end of year are passed, I would recommend putting the results in a normal table and only adding the updates as often as your threshold dates occur (annually?) using a stored procedure. Otherwise your MV will take longer to run and require more system resources with every execution that adds a new date.

Do not create a materialized view. This is not just a performance issue. It is also an archiving issue: You don't want to run the risk that historical results could change.
My advice is to create a single table with a "year" column. Run the query once per year and insert the rows into the new table. This is an archive of the results.
Note: If you want to recalculate previous years because the results may have changed (say the data is updated somehow), then you should store those results in a separate table and decide which version is the "right" version. You may find that you want an archive table with both the "as-of" date and the "run-date" to see how results might be changing.


Automatically add date for each day in SQL

I'm working on BigQuery and have created a view using multiple tables. Each day data needs to be synced with multiple platforms. I need to insert a date or some other field via SQL through which I can identify which rows were added into the view each day or which rows got updated so only that data I can take forward each day instead of syncing all every day. Best way I can think is to somehow add the the current date wherever an update to a row happens but that date needs to be constant until a further update happens for that record.
Sample data
Say we get the view T1 on 1st September and T2 on 2nd. I need to to only spot ID:2 for 1st September and ID:3,4,5 on 2nd September. Note: no such date column is there.I need help in creating such column or any other approach to verify which rows are getting updated/added daily
You can create a BigQuery schedule queries with frequency as daily (24 hours) using below INSERT statement:
INSERT INTO dataset.T1
date > (SELECT MAX(date) FROM dataset.T1);
Your table where the data is getting streamed to (in your case: sample data) needs to be configured as a partitioned table. Therefor you use "Partition by ingestion time" so that you don't need to handle the date yourself.
Configuration in BQ
After you recreated that table append your existing data to that new table with the help of the format options in BQ (append) and RUN.
Then you create a view based on that table with:
FROM `your_dataset.your_sample_data_table`
WHERE rank = 1
Always use the view from that on.

Create a dynamic view based on partitioned tables

We have a large database with monthly partitioned tables. I need to aggregate a selection of these tables every month but I don't want to update the union all every month to add the new monthly table.
CREATE VIEW dynamic_view AS
SELECT timestamp,
FROM traffic_table_m_2017_01
SELECT timestamp,
FROM traffic_table_m_2017_02
Is this where I would use a stored procedure? I am not really familiar with them.
I think it would also work as:
SELECT timestamp,
FROM REPLACE(REPLACE('traffic_table_m_yyyy_mm',
yyyy, FORMAT(GETDATE(),'yyyy', 'en-us')),
mm, FORMAT(GETDATE(),'mm', 'en-us'));
This might work for the current month but I would need to save the data from the past months which would also be an issue.
you should append each table as it arrives to 1 larger table then run your queries against that. there are many ways to do this but probable the fastest and most elegant is to use.
Instructions here

How to add dates to database records for trending analysis

I have a SQL server database table that contain a few thousand records. These records are populated by PowerShell scripts on a weekly basis. These scripts basically overwrite last weeks data so the table only has information pertaining to the previous week. I would like to be able to take a copy of that tables data each week and add a date column with that day's date beside each record. I need this so can can do trend analysis in the future.
Unfortunately, I don't have access to the PowerShell scripts to edit them. Is there any way I can accomplish this using MS SQL server or some other way?
You can do the following. Create a table that will contain the clone + dates. Insert the results from your original table along with the date into your clone table. From your description you don't need a where clause because the results of the original table are wiped out only holding new data. After the initial table creation there is no need to do it again. You'll just simply do the insert piece. Obviously the below is very basic and is just to provide you the framework.
CREATE TABLE yourTableClone
col1 int
col2 varchar(5)...
col5 date
insert into yourTableClone
select *, getdate()
from yourOriginalTable

Sql Server 2008 partition table based on insert date

My question is about table partitioning in SQL Server 2008.
I have a program that loads data into a table every 10 mins or so. Approx 40 million rows per day.
The data is bcp'ed into the table and needs to be able to be loaded very quickly.
I would like to partition this table based on the date the data is inserted into the table. Each partition would contain the data loaded in one particular day.
The table should hold the last 50 days of data, so every night I need to drop any partitions older than 50 days.
I would like to have a process that aggregates data loaded into the current partition every hour into some aggregation tables. The summary will only ever run on the latest partition (since all other partitions will already be summarised) so it is important it is partitioned on insert_date.
Generally when querying the data, the insert date is specified (or multiple insert dates). The detailed data is queried by drilling down from the summarised data and as this is summarised based on insert date, the insert date is always specified when querying the detailed data in the partitioned table.
Can I create a default column in the table "Insert_date" that gets a value of Getdate() and then partition on this somehow?
I can create a column in the table "insert_date" and put a hard coded value of today's date.
What would the partition function look like?
Would seperate tables and a partitioned view be better suited?
I have tried both, and even though I think partition tables are cooler. But after trying to teach how to maintain the code afterwards it just wasten't justified. In that scenario we used a hard coded field date field that was in the insert statement.
Now I use different tables ( 31 days / 31 tables ) + aggrigation table and there is an ugly union all query that joins togeather the monthly data.
Advantage. Super timple sql, and simple c# code for bcp and nobody has complained about complexity.
But if you have the infrastructure and a gaggle of .net / sql gurus I would choose the partitioning strategy.

How do I Automatically insert monthly records into a table via SQL?

I'm trying to generate monthly records in one table based on instructions in another table. Software - MS Access 2007, though I'm looking for an SQL solution here. To greatly simplify the matter, let's say the following describes the tables:
- DayDue
- TaskName
- DateDue
- TaskName
So what happens is that there may be an entry in TaskManager {15, "Accounts due"}, so this should lead to an "Account due" record in the Task table with the due date being the 15th of each month. I'd want it to create records for the last few months and the next year.
What I'm thinking that I need to do is first create a SELECT query that results in x records for each record in the TaskManager table, with a date for each month. After that, I do an INSERT query which inserts records into the Task table if they do not EXIST in the aforementioned SELECT query.
I think I can manage the INSERT query, though I'm having trouble figuring out how to do the SELECT query. Could someone give me a pointer?
You could use a calendar table.
INSERT INTO Task ( DateDue, TaskName )
SELECT calendar.CalDate, TaskManager.TaskName
FROM calendar, TaskManager
WHERE (((Day([CalDate]))=TaskManager.DayDue)
AND ((calendar.CalDate)<#7/1/2013#));
The calendar table would simply contain all dates and other such relevant fields as work day (yesno). Calendar tables are generally quite useful.
Here is the solution I developed using Remou's Calendar table idea.
First create a Calendar table, which simply contains all dates for a desired range. It's easy to just make the dates in Excel and paste them into the table. This is also a very reliable way of doing it, as Excel handles leap years correctly for the modern range of dates.
After building this table, there are three queries to run. The first is a SELECT, which selects every possible task generated by the TaskManager based on the date and frequency. This query is called TaskManagerQryAllOptions, and has the following code:
SELECT TaskManager.ID, Calendar.CalendarDate
FROM TaskManager INNER JOIN Calendar ON
TaskManager.DateDay = Day(Calendar.CalendarDate)
WHERE (TaskManager.Frequency = "Monthly")
OR (TaskManager.Frequency = "Yearly" AND
TaskManager.DateMonth = Month(Calendar.CalendarDate))
OR (TaskManager.Frequency = "Quarterly" AND
(((Month(Calendar.CalendarDate)- TaskManager.DateMonth) Mod 3) = 0));
The bulk of the above is to cover the different options a quarterly Day and Month pair could cover. The next step is another SELECT query, which selects records from the TaskManagerQryAllOptions in which the date is within the required range. This query is called TaskManagerQrySelect.
SELECT TaskManagerQryAllOptions.ID, TaskManager.TaskName,
FROM TaskManagerQryAllOptions INNER JOIN TaskManager
ON TaskManagerQryAllOptions.ID = TaskManager.ID
WHERE (TaskManagerQryAllOptions.CalendarDate > Date()-60)
AND (TaskManagerQryAllOptions.CalendarDate < Date()+370)
AND (TaskManagerQryAllOptions.CalendarDate >= TaskManager.Start)
AND ((TaskManagerQryAllOptions.CalendarDate <= TaskManager.Finish)
OR (TaskManager.Finish Is Null))
ORDER BY TaskManagerQryAllOptions.CalendarDate;
The final query is an INSERT. As we will be using this query frequently, we don't want it to generate duplicates, so we need to filter out already created records.
INSERT INTO Task ( TaskName, TaskDate )
SELECT TaskManagerQrySelect.TaskName, TaskManagerQrySelect.CalendarDate
FROM TaskManagerQrySelect
WHERE Not Exists(
WHERE Task.TaskName = TaskManagerQrySelect.TaskName
AND Task.TaskDate = TaskManagerQrySelect.CalendarDate);
One limitation of this method is that if the date of repetition (e.g. the 15th of each month) is changed, the future records with the wrong day will remain. A solution to this would be to update all the future records with the adjusted date, then run the insert.
One possibility could be to create a table of Months, and a table of Years (prior year, current, and next one). I could run a SELECT query which takes the Day from the TaskManager table, the Month from the Month table, and the Year from the Year table - I imagine that this could somehow create my desired multiple records for a single TaskManager record. Though I'm not sure what the exact SQL would be.