Will BigQuery move records into the right partition? - google-bigquery

I have a table that is partitioned by the timestamp contained in column X. However, during ingestion this value might be NULL, and only later on will be filled with an UPDATE.
Will BigQuery move the record to the right partition after the UPDATE?

Yes, if you execute an UPDATE statement and set the partitioning column to have different timestamps, BigQuery will move the associated rows into the appropriate partitions.

Related

Adding multiple partitioned columns to BigQuery table from SQL query

I've been trying to add multiple partition columns, to a BigQuery table, but it seems to only take one field, even if I add multiple partition fields in the query parameters.
I'm partitioning by date time and integer range.
It only takes the later of the pair to create partitions and ignores the first partition field.
Any ideas, would be appreciated?
BigQuery only supports partitioning on one column. If you want to partition on multiple columns, you can consider partitioning+clustering. The table can be clustered on up to 4 columns.
I use coalesce to combine the columns and partition the new field created from coalesce, worked for my purpose.

BigQuery - Updating a rows Partiton Timestamp Value, does this repartition

If I am inserting rows into BQ, using a custom partiton column, the rows are placed in the right date partition. Great.
If I subsequently issue a DML Update statement to a bunch of row, updating the timestamp value of the column used for partitioning, will these rows be re-partitioned based on the new value just update to? Essentially 'moving' its partition?
Thanks
Yes, it does
Even more - using an UPDATE statement you can modify the _PARTITIONTIME pseudo column

Update in partitioned table

I am trying to update a null value to 0f so it can be used for aggregation.
The following is my code:
update x:0f from data where date=2016.07.01,null x;
but it didn't work on a partitioned table, how am I able to update on a partitioned table?
The "par" error occurs when you try to update a partitioned table, which you can't do like that. Instead you have to generate the updated column and write back to disk.
If you're doing this to all date slices, your best bet might be to use the "fncol" function in the dbmaint utilities (https://github.com/KxSystems/kdb/blob/master/utils/dbmaint.md) to apply a function to the column throughout history. For example
fncol[`:/path/to/db;`data;`x;0f^]

How do I get the last update time of a sequence of tables in BigQuery?

A BigQuery best practice is to split timeseries in daily tables (as "NAME_yyyyMMdd") and then use Table Wildcards to query one or more of these tables.
Sometimes it is useful to get the last update time on a certain set of data (i.e. to check correctness of the ingestion procedure). How do I get the last update time over a set of tables organized like that?
A good way to achieve that is to use the __TABLES__ meta-table. Here is a generic query I use in several projects:
SELECT
MAX(last_modified_time) LAST_MODIFIED_TIME,
IF(REGEXP_MATCH(RIGHT(table_id,8),"[0-9]{8}"),LEFT(table_id,LENGTH(table_id) - 8),table_id) AS TABLE_ID
FROM
[my_dataset.__TABLES__]
GROUP BY
TABLE_ID
It will return the last update time of every table in my_dataset. For tables organized with a daily-split structure, it will return a single value (the update time of the latest table), with the initial part of their name as TABLE_ID.
SELECT *
FROM project_name.data_set_name.INFORMATION_SCHEMA.PARTITIONS
where table_name='my_table';
Solution for Google

How to apply Partition on hive table which is already partitioned

How to apply Partition on hive table which is already partitioned. I am not able to fetch the partitioned data into the folder after the data is loaded.
1st rule of partitioning in hive is that the partitionioning column should be the last column in the data. since the data is already partitioned lets say we are partitioning data on gender M/F there will be two directories gender=M and gender=F be created inside each of the directories respective gender data will be available and last column again in this data will be gender.
If you want to partiton data again on partitioned table use insert into select and make sure last column you use is the partition column you want to on the partitioned data.
Did you add a partition manually with the Hdfs command ? In that case metastore will not keep track of partitions being added unless you specify " alter table add partition "...
try this
MSCK REPAIR TABLE table_name;
If that is not the case , then try to drop partitions and create the partitions again . Use alter table command to do this. but you will lose the data . and your partitioning column value should be mentioned as last column in case if you are doing a dynamic partition insert.