how to convert a non-partitioned table into a partitioned one - google-bigquery

How to rename a TABLE in Big query using StandardSQL or LegacySQL.
I'm trying with StandardSQL but it is giving following error,
RENAME TABLE dataset.old_table_name TO dataset.new_table_name;
Statement not supported: RenameStatement at [1:1]
Does it mean there is no any method(SQL QUERY) Which can rename a table?

I just want to change from non-partition table to partition-table
You can achieve this in two steps process
Step 1 - Export your table to Google Cloud Storage
Step 2 - Load file from GCS back to GBQ into new table with partitioned column
Both are free of charge
Still, have in mind some limitatins of partitioned tables - like number of partitions for example - it is 4000 per table as of today - https://cloud.google.com/bigquery/quotas#partitioned_tables

Currently it is not possible to rename table in Bigquery as explained in this document. You will have to create another table by following the steps given by Mikhail. Notice there is still some charge from table storage, but it is minimal. See this doc for detail information.

You can use the below query, it will create a new table with distinct records from old table with partition on given column.
create or replace table `dataset.new_table` PARTITION BY DATE(date_time_column) as select distinct * from `dataset.old_table`

Related

Create partitioned table with date suffix in bigquery using SQL or web UI

I want to create such table:
CREATE TABLE sometable
(SELECT columns, columns, date_col)
PARTITIONED BY date_col
And I want it to be date partitioned with the date in table suffix: sometable$date_partition
I read the docs, but can't complete this neither with web UI nor with SQL.
The web UI shows such error "Missing argument for parameter DATE."
My table name is "daily_export_${DATE}"
My partitioning column isn't blank, it's date_col.
Can I have a simple example, please?
PARTITION BY goes earlier
The query needs to parse the table suffix into a DATE type.
For example:
CREATE OR REPLACE TABLE temp.so
PARTITION BY date_from_table_name
AS
SELECT PARSE_DATE('%Y%m%d', _table_suffix) date_from_table_name, event_timestamp, event_name, items
FROM `bingo-blast-174dd.analytics_151321511.events_*`
WHERE _table_suffix BETWEEN '20200530' AND '20200531'
LIMIT 10
As you can see in this documentation, BigQuery implements two different concepts: sharded tables and partitioned tables
The first one (sharded tables) is a way of dividing a whole table into many tables with a date suffix. You can query those tables individually or using wildcards. For example, instead of creating a single table named events, you can create many tables named events_20200101, events_20200102, [...]
When you do that, you are able to query any of those tables individually or you can query all of them by running some query like select * from events_*
The second concept (partitioned tables) is an approach to fragment your table in smaller pieces in order to improve the performance and reduce costs when querying data. Partitioned tables can be based on some column of your table or even on the ingestion time. When you table is partitioned by ingestion time you can access a pseudo column named _PARTITIONTIME
When comparing both approaches, the documentation says:
Date/timestamp partitioned tables perform better than tables sharded
by date. When you create date-named tables, BigQuery must maintain a
copy of the schema and metadata for each date-named table. Also, when
date-named tables are used, BigQuery might be required to verify
permissions for each queried table. This practice also adds to query
overhead and impacts query performance. The recommended best practice
is to use date/timestamp partitioned tables instead of date-sharded
tables.
In your case, you basically need to create a partitioned table without a date in its name.

Is it possible to change partition metadata in HIVE?

This is an extension of a previous question I asked: How to compare two columns with different data type groups
We are exploring the idea of changing the metadata on the table as opposed to performing a CAST operation on the data in SELECT statements. Changing the metadata in the MySQL metastore is easy enough. But, is it possible to have that metadata change applied to partitions (they are daily)? Otherwise, we might be stuck with current and future data being of type BIGINT while the historical is STRING.
Question: Is it possible to change partition meta data in HIVE? If yes, how?
You can change partition column type using this statement:
alter table {table_name} partition column ({column_name} {column_type});
Also you can re-create table definition and change all columns types using these steps:
Make your table external, so it can be dropped without dropping the data
ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='TRUE');
Drop table (only metadata will be removed).
Create EXTERNAL table using updated DDL with types changed and with the same LOCATION.
recover partitions:
MSCK [REPAIR] TABLE tablename;
The equivalent command on Amazon Elastic MapReduce (EMR)'s version of Hive is:
ALTER TABLE tablename RECOVER PARTITIONS;
This will add Hive partitions metadata. See manual here: RECOVER PARTITIONS
And finally you can make you table MANAGED again if necessary:
ALTER TABLE tablename SET TBLPROPERTIES('EXTERNAL'='FALSE');
Note: All commands above should be ran in HUE, not MySQL.
You can not change the partition column in hive infact Hive does not support alterting of partitioning columns
Refer : altering partition column type in Hive
You can think of it this way
- Hive stores the data by creating a folder in hdfs with partition column values
- Since if you trying to alter the hive partition it means you are trying to change the whole directory structure and data of hive table which is not possible
exp if you have partitioned on year this is how directory structure looks like
tab1/clientdata/2009/file2
tab1/clientdata/2010/file3
If you want to change the partition column you can perform below steps
Create another hive table with required changes in partition column
Create table new_table ( A int, B String.....)
Load data from previous table
Insert into new_table partition ( B ) select A,B from table Prev_table

how to convert an old style partitioned table to a new style partitioned table easily

I have a partitioned table with name like mytable_* where the suffix denote date.
I would now like to convert this to the new way that tables are to be partitioned in bigquery, ie. with the _partitiondate column, etc.
I was thinking of creating the new table's schema based on the old one, and then to insert data into it, but I am not sure how to put the date value (that is the suffix) of the old table into the _partitiondate field.
If you have previously created date-sharded tables, you can convert the entire set of related tables into a single ingestion-time partitioned table by using the partition command in the bq command-line tool. The date-sharded tables must use the following naming convention: [TABLE]_YYYYMMDD. For example, mytable_20160101, ... , mytable_20160331.
For this you should use bq partition command like in example below
bq --location=[LOCATION] partition --time_partitioning_type=DAY --time_partitioning_expiration [INTEGER] [PROJECT_ID]:[DATASET].[SOURCE_TABLE]_ [PROJECT_ID]:[DATASET].[DESTINATION_TABLE]
You can see more details / options in below articles
Converting date-sharded tables into ingestion-time partitioned tables
and
bq partition

How to append data to an existing partition in BigQuery table

We can create a partition on BigQuery table while creating a BigQuery table.
I have some questions on the partition.
How to append data to an existing partition in the BigQuery table.
How to create a new Partition in an existing BigQuery table if there is already partition present in that BiQuery table.
How to do truncate and load data to a partition in the BigQuery table(overwrite data in a partition in the BigQuery table).
How to append data to an existing partition in the BigQuery table.
Either you do this from Web UI or with API or with any client of your choice - the approach is the same - you just set your Destination Table with respective partition decorator, like below as an example
yourProject.yourDataset.youTable$20171010
Please note: to append your data - you need to use Append to table for Write Preference
How to create a new Partition in an existing BigQuery table if there is already partition present in that BiQuery table.
If the partition you set in decorator of destination table does not exist yet - it will be added for you
How to do truncate and load data to a partition in the BigQuery table(overwrite data in a partition in the BigQuery table).
To truncate and load to a specific partition - you should use Overwrite table for Write Preference

How to apply Partition on hive table which is already partitioned

How to apply Partition on hive table which is already partitioned. I am not able to fetch the partitioned data into the folder after the data is loaded.
1st rule of partitioning in hive is that the partitionioning column should be the last column in the data. since the data is already partitioned lets say we are partitioning data on gender M/F there will be two directories gender=M and gender=F be created inside each of the directories respective gender data will be available and last column again in this data will be gender.
If you want to partiton data again on partitioned table use insert into select and make sure last column you use is the partition column you want to on the partitioned data.
Did you add a partition manually with the Hdfs command ? In that case metastore will not keep track of partitions being added unless you specify " alter table add partition "...
try this
MSCK REPAIR TABLE table_name;
If that is not the case , then try to drop partitions and create the partitions again . Use alter table command to do this. but you will lose the data . and your partitioning column value should be mentioned as last column in case if you are doing a dynamic partition insert.