I have an interval partitioned table: PARTITION_TEST.
I need to pick a data from the last partition. Is it possible, without refering to dba_tab_partition table?
This table will have only 5 partitions at a time. Is there any way to select data from 5th partition position ?
Something like,
SELECT * FROM PARTITION_TEST partition_position(5)?
Related
I have a table containing years of data but no date or timestamp columns. Now I have to fetch last one year's data. How can achieve that when the table does not have any timestamp or date columns ?
How can achieve that when the table does not have any timestamp or date columns?
In general, you cannot; if you do not have any data inside the table to tell you a date associated with the row then there is not any meta-data that will tell you.
If you have enabled flashback (with a large enough history) on the table then you could compare the state of the table now to the state of the table a year ago using something like:
SELECT * FROM table_name
MINUS
SELECT * FROM table_name AS OF ADD_MONTHS(SYSDATE, -12);
I have a BigQuery table with ~5k unique IDs. Every day new rows are inserted for IDs that may or may not already exist.
We use this query to find the most recent rows:
SELECT t.*
EXCEPT (seqnum),
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY date_of_data DESC
) as seqnum
FROM `[project]`.[dataset].[table] t
) t
WHERE seqnum = 1
Although we only want the most recent row for each ID, this query must scan the entire table. This query is slower and more expensive every day as the table size grows. Right now, for an 8GB table, the query above creates a 22MB table. We would much rather query the 22MB table if it could stay up-to-date.
Is it possible to create a materialized view that gets the latest rows for each ID?
Is there a better solution than growing tables to infinity?
Other requirements:
Keep historical data (somewhere)
Can't use updates - we would do more than 1,500 per day - https://cloud.google.com/bigquery/quotas
One of the solutions would be to partition your main table (with all rows) by column date_of_data with a daily granularity.
Create a separate table which will keep only the most recent row for each ID. Populate it once with a single scan of entire main table and then update it every day by querying only the last day of the main table. Thanks to the partitioning querying the last day of the main table will scan only the last day of the main table.
I'm working on BigQuery and have created a view using multiple tables. Each day data needs to be synced with multiple platforms. I need to insert a date or some other field via SQL through which I can identify which rows were added into the view each day or which rows got updated so only that data I can take forward each day instead of syncing all every day. Best way I can think is to somehow add the the current date wherever an update to a row happens but that date needs to be constant until a further update happens for that record.
Ex:
Sample data
Say we get the view T1 on 1st September and T2 on 2nd. I need to to only spot ID:2 for 1st September and ID:3,4,5 on 2nd September. Note: no such date column is there.I need help in creating such column or any other approach to verify which rows are getting updated/added daily
You can create a BigQuery schedule queries with frequency as daily (24 hours) using below INSERT statement:
INSERT INTO dataset.T1
SELECT
*
FROM
dataset.T2
WHERE
date > (SELECT MAX(date) FROM dataset.T1);
Your table where the data is getting streamed to (in your case: sample data) needs to be configured as a partitioned table. Therefor you use "Partition by ingestion time" so that you don't need to handle the date yourself.
Configuration in BQ
After you recreated that table append your existing data to that new table with the help of the format options in BQ (append) and RUN.
Then you create a view based on that table with:
SELECT * EXCEPT (rank)
FROM (
SELECT
*,
ROW_NUMBER() OVER (GROUP BY invoice_id ORDER BY _PARTITIONTIME desc) AS rank
FROM `your_dataset.your_sample_data_table`
)
WHERE rank = 1
Always use the view from that on.
I have a Big Query table with daily partitions
Now the problem is in one of the partitions i.e. the last partition of the month (for example : 2019-12-31) I have some data that should belong to the next partition i.e 2020-01-01.
I want to know if it is possible to take out that data from my partition 2019-12-31 and put it in the next partition 2020-01-01 using Big Query SQL? or do I have to create a Beam job for it?
Yes, using DML. UPDATE statement moves rows from one partition to another.
Updating data in a partitioned table using DML is the same as updating data from a non-partitioned table.
For example, the following UPDATE statement moves rows from one partition to another. Rows in the May 1, 2017 partition (“2017-05-01”) of mytable where field1 is equal to 21 are moved to the June 1, 2017 partition (“2017-06-01”).
UPDATE
project_id.dataset.mycolumntable
SET
ts = "2017-06-01"
WHERE
DATE(ts) = "2017-05-01"
AND field1 = 21
I have an external hive table which is partitioned on load_date (DD-MM-YYYY). however the very first period lets say 01-01-2000 has all the data from 1980 till 2000. How can I further create partitions on year for the previous data while keeping the existing data (data for load date greater than 01-01-2000) still available
First load the data of '01-01-2000' into a table and create a dynamic partition table partitioned by data '01-01-2000'. This might solve your problem.