How do I Automatically insert monthly records into a table via SQL? - sql

I'm trying to generate monthly records in one table based on instructions in another table. Software - MS Access 2007, though I'm looking for an SQL solution here. To greatly simplify the matter, let's say the following describes the tables:
TaskManager:
- DayDue
- TaskName
Task:
- DateDue
- TaskName
So what happens is that there may be an entry in TaskManager {15, "Accounts due"}, so this should lead to an "Account due" record in the Task table with the due date being the 15th of each month. I'd want it to create records for the last few months and the next year.
What I'm thinking that I need to do is first create a SELECT query that results in x records for each record in the TaskManager table, with a date for each month. After that, I do an INSERT query which inserts records into the Task table if they do not EXIST in the aforementioned SELECT query.
I think I can manage the INSERT query, though I'm having trouble figuring out how to do the SELECT query. Could someone give me a pointer?

You could use a calendar table.
INSERT INTO Task ( DateDue, TaskName )
SELECT calendar.CalDate, TaskManager.TaskName
FROM calendar, TaskManager
WHERE (((Day([CalDate]))=TaskManager.DayDue)
AND ((calendar.CalDate)<#7/1/2013#));
The calendar table would simply contain all dates and other such relevant fields as work day (yesno). Calendar tables are generally quite useful.

Here is the solution I developed using Remou's Calendar table idea.
First create a Calendar table, which simply contains all dates for a desired range. It's easy to just make the dates in Excel and paste them into the table. This is also a very reliable way of doing it, as Excel handles leap years correctly for the modern range of dates.
After building this table, there are three queries to run. The first is a SELECT, which selects every possible task generated by the TaskManager based on the date and frequency. This query is called TaskManagerQryAllOptions, and has the following code:
SELECT TaskManager.ID, Calendar.CalendarDate
FROM TaskManager INNER JOIN Calendar ON
TaskManager.DateDay = Day(Calendar.CalendarDate)
WHERE (TaskManager.Frequency = "Monthly")
OR (TaskManager.Frequency = "Yearly" AND
TaskManager.DateMonth = Month(Calendar.CalendarDate))
OR (TaskManager.Frequency = "Quarterly" AND
(((Month(Calendar.CalendarDate)- TaskManager.DateMonth) Mod 3) = 0));
The bulk of the above is to cover the different options a quarterly Day and Month pair could cover. The next step is another SELECT query, which selects records from the TaskManagerQryAllOptions in which the date is within the required range. This query is called TaskManagerQrySelect.
SELECT TaskManagerQryAllOptions.ID, TaskManager.TaskName,
TaskManagerQryAllOptions.CalendarDate
FROM TaskManagerQryAllOptions INNER JOIN TaskManager
ON TaskManagerQryAllOptions.ID = TaskManager.ID
WHERE (TaskManagerQryAllOptions.CalendarDate > Date()-60)
AND (TaskManagerQryAllOptions.CalendarDate < Date()+370)
AND (TaskManagerQryAllOptions.CalendarDate >= TaskManager.Start)
AND ((TaskManagerQryAllOptions.CalendarDate <= TaskManager.Finish)
OR (TaskManager.Finish Is Null))
ORDER BY TaskManagerQryAllOptions.CalendarDate;
The final query is an INSERT. As we will be using this query frequently, we don't want it to generate duplicates, so we need to filter out already created records.
INSERT INTO Task ( TaskName, TaskDate )
SELECT TaskManagerQrySelect.TaskName, TaskManagerQrySelect.CalendarDate
FROM TaskManagerQrySelect
WHERE Not Exists(
SELECT *
FROM Task
WHERE Task.TaskName = TaskManagerQrySelect.TaskName
AND Task.TaskDate = TaskManagerQrySelect.CalendarDate);
One limitation of this method is that if the date of repetition (e.g. the 15th of each month) is changed, the future records with the wrong day will remain. A solution to this would be to update all the future records with the adjusted date, then run the insert.

One possibility could be to create a table of Months, and a table of Years (prior year, current, and next one). I could run a SELECT query which takes the Day from the TaskManager table, the Month from the Month table, and the Year from the Year table - I imagine that this could somehow create my desired multiple records for a single TaskManager record. Though I'm not sure what the exact SQL would be.

Related

creating materialized view for annual report based on slow function

Consider the following scenario:
I have a table with 1 million product ids products :
create table products (
pid number,
p_description varchar2(200)
)
also there is a relatively slow function
function gerProductMetrics(pid,date) return number
which returns some metric for the given product at given date.
there is also an annual report executed every year that is based on the following query:
select pid,p_description,getProductMetrics(pid,'2019-12-31') from
products
that query takes about 20-40 minutes to execute for a given year.
would it be correct approach to create Materialized View (MV) for this scenario using the following
CREATE TABLE mydates
(
mydate date
);
INSERT INTO mydates (mydate)
VALUES (DATE '2019-12-31');
INSERT INTO mydates (mydate)
VALUES (DATE '2018-12-31');
INSERT INTO mydates (mydate)
VALUES (DATE '2017-12-31');
CREATE MATERIALIZED VIEW metrics_summary
BUILD IMMEDIATE
REFRESH FORCE ON DEMAND
AS
SELECT pid,
getProductMetrics(pid,mydate AS annual_metric,
mydate
FROM products,mydates
or it would take forever?
Also, how and how often would I update this MV?
Metrics data is required for the end of each year.
But any year's data could be requested at any time.
Note, that I have no control over the slow function - it's just a given.
thanks.
First, you do not have a "group by" query, so you can remove that.
An MV would be most useful if you needed to recompute all of the data for all years. As this appears to be a summary, with no need to reprocess old data, updated only when certain threshold dates like end of year are passed, I would recommend putting the results in a normal table and only adding the updates as often as your threshold dates occur (annually?) using a stored procedure. Otherwise your MV will take longer to run and require more system resources with every execution that adds a new date.
Do not create a materialized view. This is not just a performance issue. It is also an archiving issue: You don't want to run the risk that historical results could change.
My advice is to create a single table with a "year" column. Run the query once per year and insert the rows into the new table. This is an archive of the results.
Note: If you want to recalculate previous years because the results may have changed (say the data is updated somehow), then you should store those results in a separate table and decide which version is the "right" version. You may find that you want an archive table with both the "as-of" date and the "run-date" to see how results might be changing.

Automatically add date for each day in SQL

I'm working on BigQuery and have created a view using multiple tables. Each day data needs to be synced with multiple platforms. I need to insert a date or some other field via SQL through which I can identify which rows were added into the view each day or which rows got updated so only that data I can take forward each day instead of syncing all every day. Best way I can think is to somehow add the the current date wherever an update to a row happens but that date needs to be constant until a further update happens for that record.
Ex:
Sample data
Say we get the view T1 on 1st September and T2 on 2nd. I need to to only spot ID:2 for 1st September and ID:3,4,5 on 2nd September. Note: no such date column is there.I need help in creating such column or any other approach to verify which rows are getting updated/added daily
You can create a BigQuery schedule queries with frequency as daily (24 hours) using below INSERT statement:
INSERT INTO dataset.T1
SELECT
*
FROM
dataset.T2
WHERE
date > (SELECT MAX(date) FROM dataset.T1);
Your table where the data is getting streamed to (in your case: sample data) needs to be configured as a partitioned table. Therefor you use "Partition by ingestion time" so that you don't need to handle the date yourself.
Configuration in BQ
After you recreated that table append your existing data to that new table with the help of the format options in BQ (append) and RUN.
Then you create a view based on that table with:
SELECT * EXCEPT (rank)
FROM (
SELECT
*,
ROW_NUMBER() OVER (GROUP BY invoice_id ORDER BY _PARTITIONTIME desc) AS rank
FROM `your_dataset.your_sample_data_table`
)
WHERE rank = 1
Always use the view from that on.

Excluding results that are within same month in SQL

I have two tables -- one, a history table that contains a log of some kind of entries, and another (let's call it flags) that contains columns about flags (for a certain account). Both tables contain account IDs.
I want to write a query that only extracts rows from the flag table if the account ID does not already have an entry for that month in the history table (e.g., in the flag table, an entry was entered on April 2, 2019 and in the history table, the account already had an entry recorded on April 1, 2019. The result is, the April 2nd entry should not be pulled up).
I have a query right now that basically looks like this:
SELECT *multiple column names*
FROM flags
WHERE NOT EXISTS (SELECT acc_id FROM history WHERE ...)
This is where I am stuck. With the subquery, I basically want to get the matches where the dates from both tables match (same month and year), and with the WHERE NOT EXISTS, exclude the results from flag that are found in the subquery (essentially I only want results where the date for the entry is not from the same month)
The most important columns are:
the account ID (to correctly associate each log entry to the right account)
date (to only get rows where the month recorded is not already logged in the history table)
I initially used MONTH(), but that only extracts the month of the date. I need it to match both the month and the year because the history table contains a few years of data.
Any help would be greatly appreciated! Thank you in advance!
SELECT *multiple column names*
FROM flags
WHERE NOT EXISTS (
SELECT 1
FROM history
WHERE history.acc_id=flags.acc_id
AND date_trunc('month', history.date) =
date_trunc('month', flags.date)
)
The date_trunc function will work for postgres, which was one of the tags originally. If you aren't using postgres, there may be a similar function in your database, it you could format the dates to just Year-month and compare the resulting strings.

SQL Server 2005 simple query to get all dates between two dates

I know there are a lot of solutions to this but I am looking for a simple query to get all the dates between two dates.
I cannot declare variables.
As per the comment above, it's just guesswork without your table structures and further detail. Also, are you using a 3NF database or star schema structures, etc. Is this a transaction system or a data warehouse?
As a general answer, I would recommend creating a Calendar table, that way you can create multiple columns for Working Day, Weekend Day, Business Day, etc. and add a date key value, starting at 1 and incrementing each day.
Your query then is a very simple sub-select or join to the table to do something like
SELECT date FROM Calendar WHERE date BETWEEN <x> AND <y>
How to create a Calender table for 100 years in Sql
There are other options like creating the calendar table using iterations (eg, as a CTE table) and linking to that.
SQL - Create a temp table or CTE of first day of the month and month names

Efficient sliding window sum over a database table

A database has a transactions table with columns: account_id, date, transaction_value (signed integer). Another table (account_value) stores the current total value of each account, which is the sum of all transaction_values per account. It is updated with a trigger on the transactions table (i.e., INSERTs, UPDATEs and DELETEs to transactions fire the trigger to change the account_value.)
A new requirement is to calculate the account's total transaction value only over the last 365 days. Only the current running total is required, not previous totals. This value will be requested often, almost as often as the account_value.
How would you implement this "sliding window sum" efficiently? A new table is ok. Is there a way to avoid summing over a year's range every time?
This can be done with standard windowing functions:
SELECT account_id,
sum(transaction_value) over (partition by account_id order by date)
FROM transactions
The order by inside the over() claues makes the sum a "sliding sum".
For the "only the last 356 days" you'd need a second query that will limit the rows in the WHERE clause.
The above works in PostgreSQL, Oracle, DB2 and (I think) Teradata. SQL Server does not support the order by in the window definition (the upcoming Denali version will AFAIK)
As simple as this?
SELECT
SUM(transaction_value), account_id
FROM
transactions t
WHERE
-- SQL Server, Sybase t.DATE >= DATEADD(year, -1, GETDATE())
-- MySQL t.DATE >= DATE_SUB(NOW(), INTERVAL 12 MONTH)
GROUP BY
account_id;
You may want to remove the time component from the date expressions using DATE (MySQL) or this way in SQL Server
If queries of the transactions table are more frequent than inserts to the transactions table, then perhaps a view is the way to go?
You are going to need a one-off script to populate the existing table with values for the preceding year for each existing record - that will need to run for the whole of the previous year for each record generated.
Once the rolling year column is populated, one alternative to summing the previous year would be to derive each new record's value as the previous record's rolling year value, plus the transaction value(s) since the last update, minus the transaction values between one year prior to the last update and one year ago from now.
I suggest trying both approaches against realistic test data to see which will perform better - I would expect summing the whole year to perform at least as well where data is relatively sparse, while the difference method may work better if data is to be frequently updated on each account.
I'll avoid any actual SQL here as it varies a lot depending on the variety of SQL that you are using.
You say that you have a trigger to maintain the existing running total.
I presume that it also (or perhaps a nightly process) creates new daily records in the account_value table. Then INSERTs, UPDATEs and DELETEs fire the trigger to add or subtract from the existing running total?
The only changes you need to make are:
- add a new field, "yearly_value" or something
- have the existing trigger update that in the same way as the existing field
- use gbn's type of answer to create today's records (or however far you backdate)
- but initialise each new daily record in a slightly different way...
When you insert a new row for a new day, it should be initialised to yesterday's value - the value 365 days ago. After that, the behavior should be identical to what you're already used to.