I'm only a self-taught data-querying guy and am wholly unfamiliar with creating tables and such. The database I'm working on does have a calendar table, but it's only a forward moving calendar moving three years out. I am needing to create a date table for end of month records between two dates, including before the system dates table begins.
How best can one create this in Snowflake SQL?
Thank you much
This will create N end of month records. You can change the start date and change N to be the delta between your dates.
select
row_number() over (order by null) id,
add_months('2020-01-01'::date, + id) - 1
from table(generator(rowcount => 100))
Related
I have a table containing years of data but no date or timestamp columns. Now I have to fetch last one year's data. How can achieve that when the table does not have any timestamp or date columns ?
How can achieve that when the table does not have any timestamp or date columns?
In general, you cannot; if you do not have any data inside the table to tell you a date associated with the row then there is not any meta-data that will tell you.
If you have enabled flashback (with a large enough history) on the table then you could compare the state of the table now to the state of the table a year ago using something like:
SELECT * FROM table_name
MINUS
SELECT * FROM table_name AS OF ADD_MONTHS(SYSDATE, -12);
I'm working on BigQuery and have created a view using multiple tables. Each day data needs to be synced with multiple platforms. I need to insert a date or some other field via SQL through which I can identify which rows were added into the view each day or which rows got updated so only that data I can take forward each day instead of syncing all every day. Best way I can think is to somehow add the the current date wherever an update to a row happens but that date needs to be constant until a further update happens for that record.
Ex:
Sample data
Say we get the view T1 on 1st September and T2 on 2nd. I need to to only spot ID:2 for 1st September and ID:3,4,5 on 2nd September. Note: no such date column is there.I need help in creating such column or any other approach to verify which rows are getting updated/added daily
You can create a BigQuery schedule queries with frequency as daily (24 hours) using below INSERT statement:
INSERT INTO dataset.T1
SELECT
*
FROM
dataset.T2
WHERE
date > (SELECT MAX(date) FROM dataset.T1);
Your table where the data is getting streamed to (in your case: sample data) needs to be configured as a partitioned table. Therefor you use "Partition by ingestion time" so that you don't need to handle the date yourself.
Configuration in BQ
After you recreated that table append your existing data to that new table with the help of the format options in BQ (append) and RUN.
Then you create a view based on that table with:
SELECT * EXCEPT (rank)
FROM (
SELECT
*,
ROW_NUMBER() OVER (GROUP BY invoice_id ORDER BY _PARTITIONTIME desc) AS rank
FROM `your_dataset.your_sample_data_table`
)
WHERE rank = 1
Always use the view from that on.
I have a external hive table employee which is partitioned by extract_timestamp (yyyy-mm-dd hh:mm:ss) as below.
empid empname extract_time
1 abc 2019-05-17 00:00:00
2 def 2019-05-18 14:21:00
I am trying to remove the partition by extract_time and change it to year,month and day partition. I am following the below method for this.
1. Create a new table employee_new with partitions year month and day
create external table employee_new
(empid int,
empname string
)
partitioned by (year int,month int,day int)
location '/user/emp/data/employee_new.txt';
2. insert overwrite into employee_new by selecting data from employee table
insert overwrite into employee_new as select*,year(extract_time),month(extract_time)
,day(extract_time)
from employee
3. Drop employee and employee_new and create employee table on top of /user/emp/data/employee_new.txt
Please let me know if this method is efficient and if there are any better ways to do the same.
Partition by date yyyy-MM-dd only, if possible, if upstream process can write hour files to daily folders. For such a small table partitioning by year, month and day separately seems overkill. It will be still too many folders.
If table is partitioned by date yyyy-MM-dd, partition pruning will work for your usage scenario because you are querying by day or year or month.
To filter by year in this case you will provide
where date >= '2019-01-01' and date < '2020-01-01' condition,
to filter by month:
where date >= '2019-01-01' and date < '2020-02-01'
and day: where date = '2019-01-01'
Filesystem listing will work much faster.
And if it is not possible to redesign upstream process to write to yyyy-MM-dd folders then your new design as you described in the question (yyyy/MM/dd folders) is the only solution.
I know there are a lot of solutions to this but I am looking for a simple query to get all the dates between two dates.
I cannot declare variables.
As per the comment above, it's just guesswork without your table structures and further detail. Also, are you using a 3NF database or star schema structures, etc. Is this a transaction system or a data warehouse?
As a general answer, I would recommend creating a Calendar table, that way you can create multiple columns for Working Day, Weekend Day, Business Day, etc. and add a date key value, starting at 1 and incrementing each day.
Your query then is a very simple sub-select or join to the table to do something like
SELECT date FROM Calendar WHERE date BETWEEN <x> AND <y>
How to create a Calender table for 100 years in Sql
There are other options like creating the calendar table using iterations (eg, as a CTE table) and linking to that.
SQL - Create a temp table or CTE of first day of the month and month names
I'm trying to generate monthly records in one table based on instructions in another table. Software - MS Access 2007, though I'm looking for an SQL solution here. To greatly simplify the matter, let's say the following describes the tables:
TaskManager:
- DayDue
- TaskName
Task:
- DateDue
- TaskName
So what happens is that there may be an entry in TaskManager {15, "Accounts due"}, so this should lead to an "Account due" record in the Task table with the due date being the 15th of each month. I'd want it to create records for the last few months and the next year.
What I'm thinking that I need to do is first create a SELECT query that results in x records for each record in the TaskManager table, with a date for each month. After that, I do an INSERT query which inserts records into the Task table if they do not EXIST in the aforementioned SELECT query.
I think I can manage the INSERT query, though I'm having trouble figuring out how to do the SELECT query. Could someone give me a pointer?
You could use a calendar table.
INSERT INTO Task ( DateDue, TaskName )
SELECT calendar.CalDate, TaskManager.TaskName
FROM calendar, TaskManager
WHERE (((Day([CalDate]))=TaskManager.DayDue)
AND ((calendar.CalDate)<#7/1/2013#));
The calendar table would simply contain all dates and other such relevant fields as work day (yesno). Calendar tables are generally quite useful.
Here is the solution I developed using Remou's Calendar table idea.
First create a Calendar table, which simply contains all dates for a desired range. It's easy to just make the dates in Excel and paste them into the table. This is also a very reliable way of doing it, as Excel handles leap years correctly for the modern range of dates.
After building this table, there are three queries to run. The first is a SELECT, which selects every possible task generated by the TaskManager based on the date and frequency. This query is called TaskManagerQryAllOptions, and has the following code:
SELECT TaskManager.ID, Calendar.CalendarDate
FROM TaskManager INNER JOIN Calendar ON
TaskManager.DateDay = Day(Calendar.CalendarDate)
WHERE (TaskManager.Frequency = "Monthly")
OR (TaskManager.Frequency = "Yearly" AND
TaskManager.DateMonth = Month(Calendar.CalendarDate))
OR (TaskManager.Frequency = "Quarterly" AND
(((Month(Calendar.CalendarDate)- TaskManager.DateMonth) Mod 3) = 0));
The bulk of the above is to cover the different options a quarterly Day and Month pair could cover. The next step is another SELECT query, which selects records from the TaskManagerQryAllOptions in which the date is within the required range. This query is called TaskManagerQrySelect.
SELECT TaskManagerQryAllOptions.ID, TaskManager.TaskName,
TaskManagerQryAllOptions.CalendarDate
FROM TaskManagerQryAllOptions INNER JOIN TaskManager
ON TaskManagerQryAllOptions.ID = TaskManager.ID
WHERE (TaskManagerQryAllOptions.CalendarDate > Date()-60)
AND (TaskManagerQryAllOptions.CalendarDate < Date()+370)
AND (TaskManagerQryAllOptions.CalendarDate >= TaskManager.Start)
AND ((TaskManagerQryAllOptions.CalendarDate <= TaskManager.Finish)
OR (TaskManager.Finish Is Null))
ORDER BY TaskManagerQryAllOptions.CalendarDate;
The final query is an INSERT. As we will be using this query frequently, we don't want it to generate duplicates, so we need to filter out already created records.
INSERT INTO Task ( TaskName, TaskDate )
SELECT TaskManagerQrySelect.TaskName, TaskManagerQrySelect.CalendarDate
FROM TaskManagerQrySelect
WHERE Not Exists(
SELECT *
FROM Task
WHERE Task.TaskName = TaskManagerQrySelect.TaskName
AND Task.TaskDate = TaskManagerQrySelect.CalendarDate);
One limitation of this method is that if the date of repetition (e.g. the 15th of each month) is changed, the future records with the wrong day will remain. A solution to this would be to update all the future records with the adjusted date, then run the insert.
One possibility could be to create a table of Months, and a table of Years (prior year, current, and next one). I could run a SELECT query which takes the Day from the TaskManager table, the Month from the Month table, and the Year from the Year table - I imagine that this could somehow create my desired multiple records for a single TaskManager record. Though I'm not sure what the exact SQL would be.