Creating partitioned table from querying partitioned table - google-bigquery

I have an existing date partitioned table and I want to create a new date partitioned table with only one column from the original table while keeping the original partitioning.
I have tried:
Creating an empty partitioned table and copying the results in from a query but then the partitioning is missing.
The following Create statement should work if I also include the partitioned date as a column in my new table which I don't want. Is there a way to use the part_date column as a partition decorator when loading in the data from a query result ?
CREATE TABLE
cat_dataset.cats_names(cat_name string)
PARTITION BY
part_date AS
SELECT
cat_name,
_PARTITIONDATE AS part_date
FROM
`myproject.cat_dataset.cats`
I want to avoid looping over all the dates and writing the data off that date to the new table. Is there a way to use the part_date column as a partition decorator when loading in the data from a query result ?

INSERT INTO allows you to specify _PARTITIONTIME as a column, see link. Code below should work:
CREATE TABLE cat_dataset.cats_names(cat_name string)
PARTITION BY DATE(_PARTITIONTIME);
INSERT INTO cat_dataset.cats_names (_PARTITIONTIME, cat_name)
SELECT _PARTITIONTIME, cat_name
FROM `myproject.cat_dataset.cats`

Related

How insert data from a temporary table into partitioned table in oracle/sql using merge statement

I have to write a merge statement to insert data from temporary table to a partitioned table and i'm getting below error:-
Error report -
SQL Error: ORA-14400: inserted partition key does not map to any partition
I have to do it session wise and as a result, have to use a temporary table which can not be partitioned.
if your inserts datasets into the partioned table, oracle want to place it
into the correct partionen. You must create for the complete period of time
partitions like in example for MONTHY partition:
ALTER TABLE sales ADD
PARTITION sales_q1_2007 VALUES LESS THAN (TO_DATE('01-APR-2007','dd-MON-yyyy')),
PARTITION sales_q2_2007 VALUES LESS THAN (TO_DATE('01-JUL-2007','dd-MON-yyyy')),
PARTITION sales_q3_2007 VALUES LESS THAN (TO_DATE('01-OCT-2007','dd-MON-yyyy')),
PARTITION sales_q4_2007 VALUES LESS THAN (TO_DATE('01-JAN-2008','dd-MON-yyyy'))
;
If you have done this, you can insert the data ass needed.
Good luck,

Using ingestion-time based pseudo-field (_PARTITIONTIME) as partition while clustering

I'd like to cluster our ingestion-time partitioned tables without having to change the ETL scripts we use to update them. All of our tables are partitioned on the pseudo-field _PARTITIONTIME, now when I try cluster a table with DML I get the following error:
Invalid field name "_PARTITIONTIME". Field names are not allowed to start with the (case-insensitive) prefixes _PARTITION, TABLE, FILE and _ROW_TIMESTAMP
Here's what the DML-script looks like:
CREATE TABLE `table_target`
PARTITION BY DATE(_PARTITIONTIME)
CLUSTER BY a, b, c
AS
SELECT
*, _PARTITIONTIME
FROM
`table_source`
How should I go about this? Is there a way to keep the same pseudo-field as the partition field, should I re-work the partition field, or am I missing something here?
It is Known limitation that:
It is not possible to create an ingestion-time partitioned table from the result of a query. Instead, use a CREATE TABLE DDL statement to create the table, and then use an INSERT DML statement to insert data into it.
In your case, you need to use CREATE TABLE to create target_table with CLUSTER BY first, then migrate data over.

Add new partition to already partitioned hive table

I have a partitioned table Student which already has one partition column dept. I need to add new partition column gender
Will it be possible to add this new partition column in already partitioned hive table.
The table data does not have gender column. It is a new constant column to be added in hive table.
Partitions are hierarchical folders like table_location/dept=Accounting/gender=male/
Folder structure should exist. You can easily add non-partition column as the last one and it will return NULLs if the data does not contain that column, but to add a partition column the easiest way is to create new table partitioned as you want, insert overwrite that table from the old one (selecting partitions columns as the last ones), drop old table, rename new one.
See this answer about dynamic partitions load: https://stackoverflow.com/a/48901871/2700344

Need to change the partition column to another column and reloading the data into new partitions

I am trying to change the already existing partition column to another column.
The current workflow I'm using:
Backup the existing data
Create a new table with new partition column
Reload the data into new partitions
My problem:
Since there is huge data in our existing partition tables, this way will be costly
Is there a way we can do Alter table and change partition column name to another?
You can not avoid 1-time cost of scanning the table as you can see from the error message generated from this CREATE OR REPLACE DML command
#standardSQL
CREATE OR REPLACE TABLE `project.dataset.table`
PARTITION BY DATE(ts)
AS
SELECT * FROM `project.dataset.table`
Cannot replace a table with a different partitioning spec. Instead, DROP the table, and then recreate it. New partitioning spec is interval(type:day,field:ts) and existing spec is none
What you can do to save cost is use the WHERE command to limit the number of the partition you move from existing table to the new table
CREATE TABLE project.mydataset.newPartitionTable
PARTITION BY date
OPTIONS (
partition_expiration_days=365,
description="Table with a new partition"
) AS
SELECT * from `project.dataset.table` WHERE
PARTITIONTIME >= '2019-01-23 00:00:00'
AND _PARTITIONTIME <= '2019-01-23 00:00:00'
You can consider for example not to move your Long-term storage which is data you haven't access for the last 90 days (see this link for more details)
If you want to keep your original table name you can drop/create it with the new partition field, after the copy, and use the copy option from webUI which will be free of charge

changing non-partition table to partition table in hive without mentioning all the columns

Hi my table has 150 columns and i dont want to mention the column name while doing partition. Is there any way using temp table to create partitioned table from non partitioned one.
Temporary table in Hive doesn't support Partitioning.
You cannot create any permanent partitioned table without mentioning Partition
Column.