Partition expiry countdown for date/time field based partition - google-bigquery

As of May 2021, Google Big Query documentation does not clearly mention when the countdown for partition expiry for time/date field partitioned table start? Is the date/time of the partition itself is the start of the partition expiry countdown or does the expiry countdown start when the partition is created?
For example, if a table like following is created
CREATE TABLE IF NOT EXISTS `project_id.dataset_name.table_name`
(
dateTime TIMESTAMP NOT NULL
, trainName STRING
, fleet STRING
, customer STRING
)
PARTITION BY DATE(dateTime)
OPTIONS (
partition_expiration_days = 3
)
So, if the table is created on say 5th of the month but while inserting the data, if the data for 1st of that month (for field dateTime) is inserted, will that data be expired already upon insertion? Or will it expire on 9th of the same month?
For ingestion based partitioning this confusion does not arise as the ingestion timestamp itself is a partition timestamp.
References:
Create a time-unit column-partitioned table
Updating default partition expiration times
Use the expiration settings to remove unneeded tables and partitions

That data will expire on insertion. Date/time of the partition itself is the start of the partition expiry countdown. So, data for the 1st of the month will not be present in the table.

Related

Automatically add date for each day in SQL

I'm working on BigQuery and have created a view using multiple tables. Each day data needs to be synced with multiple platforms. I need to insert a date or some other field via SQL through which I can identify which rows were added into the view each day or which rows got updated so only that data I can take forward each day instead of syncing all every day. Best way I can think is to somehow add the the current date wherever an update to a row happens but that date needs to be constant until a further update happens for that record.
Ex:
Sample data
Say we get the view T1 on 1st September and T2 on 2nd. I need to to only spot ID:2 for 1st September and ID:3,4,5 on 2nd September. Note: no such date column is there.I need help in creating such column or any other approach to verify which rows are getting updated/added daily
You can create a BigQuery schedule queries with frequency as daily (24 hours) using below INSERT statement:
INSERT INTO dataset.T1
SELECT
*
FROM
dataset.T2
WHERE
date > (SELECT MAX(date) FROM dataset.T1);
Your table where the data is getting streamed to (in your case: sample data) needs to be configured as a partitioned table. Therefor you use "Partition by ingestion time" so that you don't need to handle the date yourself.
Configuration in BQ
After you recreated that table append your existing data to that new table with the help of the format options in BQ (append) and RUN.
Then you create a view based on that table with:
SELECT * EXCEPT (rank)
FROM (
SELECT
*,
ROW_NUMBER() OVER (GROUP BY invoice_id ORDER BY _PARTITIONTIME desc) AS rank
FROM `your_dataset.your_sample_data_table`
)
WHERE rank = 1
Always use the view from that on.

Split hive partition to create multiple partition

I have an external hive table which is partitioned on load_date (DD-MM-YYYY). however the very first period lets say 01-01-2000 has all the data from 1980 till 2000. How can I further create partitions on year for the previous data while keeping the existing data (data for load date greater than 01-01-2000) still available
First load the data of '01-01-2000' into a table and create a dynamic partition table partitioned by data '01-01-2000'. This might solve your problem.

Oracle query to fetch data based upon a order without using order by

Below is the problem statement :
I have a table where we are inserting some data along with a business date of that Hospital along with the create time stamp.
The business date (Bday) defines the logical day for which the hospital has done the business and the create time stamp (create_ts) defines the timestamp at which the record was inserted in our database.
The business date could be 1 day ahead of the create timestamp as the create timestamp is in PST and the Hospital can be in South Asia, Australia region.
Also a hospital can be opening for next day if they accidentally closed for today's business date in the application.
I need to sync the record from a staging database to main database.
I want to sync the record with minimum create timestamp first. But order by is a costly operation as sometimes the number of records to be synced are more than 100k
10 Different threads are running to sync the records each with a batch size of 50.
First solution tries was to pickup the records with business date = min business date but there were some complexities:
1) as there were some records with minimum business date which were not syncing due to some issue as a result the next day records were never picked up without manual interventions.
2) There were hospitals which opened for a future business date ( for example today is 13th and they closed accidentally for it so they were opened for business date 14th ). SO these records did not get process as the business date was greater than min business date.
Thought of picking records where the create timestamp is between minimum create timestamp and +1 hour .
But there could be again issue as mentioned in point (1) and there may not be any record to be synced between the stuck record with minimum create timestamp and +1 hour.
Please suggest a solution for the query
Few columns in the table are : Hname (Hospital Name) , Hloc (Hospital location) , Dseq (Per day sequence number ) , Bday (business date ) , create_ts and modify_ts
Oracle doesn't guaranty the order of the resultset if you don't use "ORDER BY" clause.
But you may consider using CDC for Oracle 11g or GoldenGate for Oracle 12c - it should be very efficient. New data will be replicated almost immediately. Oracle will take care of it.
in general: no order by --> no guarantee on the order of result set.

How to query data from latest partition in interval partitioned table

I have an interval partitioned table: PARTITION_TEST.
I need to pick a data from the last partition. Is it possible, without refering to dba_tab_partition table?
This table will have only 5 partitions at a time. Is there any way to select data from 5th partition position ?
Something like,
SELECT * FROM PARTITION_TEST partition_position(5)?

Creating a table with date as range partition in SQL Server 2012

I am new to SQL Server coding. Please let me know how to create a table with range partition on date in SQL Server
A similar syntax in teradata would be the following (a table is created with order date as range partition over year 2012 with each day as single partition )
CREATE TABLE ORDER_DATA (
ORDER_NUM INTEGER NOT NULL
,CUST_NUM INTEGER
,ORDER_DATE DATE
,ORDER_TOT DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NUM)
PARTITION BY (RANGE_N ( ORDER_DATE BETWEEN DATE ‘2012-01-01’ AND DATE 2012-12-31 EACH INTERVAL ‘1’ DAY));
Thanks in advance
The process of creating partitioned table is described on MSDN as follows:
Creating a partitioned table or index typically happens in four parts:
1. Create a filegroup or filegroups and corresponding files that will hold the partitions specified by the partition scheme.
2. Create a partition function that maps the rows of a table or index into partitions based on the values of a specified column.
3. Create a partition scheme that maps the partitions of a partitioned table or index to the new filegroups.
4. Create or modify a table or index and specify the partition scheme as the storage location.
You can find code samples on MSDN.