Unable to load data into Ingestion time partitioned table - google-bigquery

I have created a blank ingestion time partitioned table with clustered by four columns. I am unable to load data in to the table. It shows below message: "Omitting INSERT target column list is unsupported for ingestion-time partitioned table". How to solve it?

I think you're trying to insert data into the table without specifying the columns name:
insert into `project-id.dataset_id.target`
select col1,col2,col3 from `project-id.dataset_id.source`;
if the target is ingestion-time partitioned table, this won't work
you have to specify the columns:
insert into `project-id.dataset_id.target`(col1,col2,col3)
select col1,col2,col3 from `project-id.dataset_id.source`;

Related

How insert data from a temporary table into partitioned table in oracle/sql using merge statement

I have to write a merge statement to insert data from temporary table to a partitioned table and i'm getting below error:-
Error report -
SQL Error: ORA-14400: inserted partition key does not map to any partition
I have to do it session wise and as a result, have to use a temporary table which can not be partitioned.
if your inserts datasets into the partioned table, oracle want to place it
into the correct partionen. You must create for the complete period of time
partitions like in example for MONTHY partition:
ALTER TABLE sales ADD
PARTITION sales_q1_2007 VALUES LESS THAN (TO_DATE('01-APR-2007','dd-MON-yyyy')),
PARTITION sales_q2_2007 VALUES LESS THAN (TO_DATE('01-JUL-2007','dd-MON-yyyy')),
PARTITION sales_q3_2007 VALUES LESS THAN (TO_DATE('01-OCT-2007','dd-MON-yyyy')),
PARTITION sales_q4_2007 VALUES LESS THAN (TO_DATE('01-JAN-2008','dd-MON-yyyy'))
;
If you have done this, you can insert the data ass needed.
Good luck,

Using ingestion-time based pseudo-field (_PARTITIONTIME) as partition while clustering

I'd like to cluster our ingestion-time partitioned tables without having to change the ETL scripts we use to update them. All of our tables are partitioned on the pseudo-field _PARTITIONTIME, now when I try cluster a table with DML I get the following error:
Invalid field name "_PARTITIONTIME". Field names are not allowed to start with the (case-insensitive) prefixes _PARTITION, TABLE, FILE and _ROW_TIMESTAMP
Here's what the DML-script looks like:
CREATE TABLE `table_target`
PARTITION BY DATE(_PARTITIONTIME)
CLUSTER BY a, b, c
AS
SELECT
*, _PARTITIONTIME
FROM
`table_source`
How should I go about this? Is there a way to keep the same pseudo-field as the partition field, should I re-work the partition field, or am I missing something here?
It is Known limitation that:
It is not possible to create an ingestion-time partitioned table from the result of a query. Instead, use a CREATE TABLE DDL statement to create the table, and then use an INSERT DML statement to insert data into it.
In your case, you need to use CREATE TABLE to create target_table with CLUSTER BY first, then migrate data over.

Is it possible to change partition metadata in HIVE?

This is an extension of a previous question I asked: How to compare two columns with different data type groups
We are exploring the idea of changing the metadata on the table as opposed to performing a CAST operation on the data in SELECT statements. Changing the metadata in the MySQL metastore is easy enough. But, is it possible to have that metadata change applied to partitions (they are daily)? Otherwise, we might be stuck with current and future data being of type BIGINT while the historical is STRING.
Question: Is it possible to change partition meta data in HIVE? If yes, how?
You can change partition column type using this statement:
alter table {table_name} partition column ({column_name} {column_type});
Also you can re-create table definition and change all columns types using these steps:
Make your table external, so it can be dropped without dropping the data
ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='TRUE');
Drop table (only metadata will be removed).
Create EXTERNAL table using updated DDL with types changed and with the same LOCATION.
recover partitions:
MSCK [REPAIR] TABLE tablename;
The equivalent command on Amazon Elastic MapReduce (EMR)'s version of Hive is:
ALTER TABLE tablename RECOVER PARTITIONS;
This will add Hive partitions metadata. See manual here: RECOVER PARTITIONS
And finally you can make you table MANAGED again if necessary:
ALTER TABLE tablename SET TBLPROPERTIES('EXTERNAL'='FALSE');
Note: All commands above should be ran in HUE, not MySQL.
You can not change the partition column in hive infact Hive does not support alterting of partitioning columns
Refer : altering partition column type in Hive
You can think of it this way
- Hive stores the data by creating a folder in hdfs with partition column values
- Since if you trying to alter the hive partition it means you are trying to change the whole directory structure and data of hive table which is not possible
exp if you have partitioned on year this is how directory structure looks like
tab1/clientdata/2009/file2
tab1/clientdata/2010/file3
If you want to change the partition column you can perform below steps
Create another hive table with required changes in partition column
Create table new_table ( A int, B String.....)
Load data from previous table
Insert into new_table partition ( B ) select A,B from table Prev_table

Creating partitioned table from querying partitioned table

I have an existing date partitioned table and I want to create a new date partitioned table with only one column from the original table while keeping the original partitioning.
I have tried:
Creating an empty partitioned table and copying the results in from a query but then the partitioning is missing.
The following Create statement should work if I also include the partitioned date as a column in my new table which I don't want. Is there a way to use the part_date column as a partition decorator when loading in the data from a query result ?
CREATE TABLE
cat_dataset.cats_names(cat_name string)
PARTITION BY
part_date AS
SELECT
cat_name,
_PARTITIONDATE AS part_date
FROM
`myproject.cat_dataset.cats`
I want to avoid looping over all the dates and writing the data off that date to the new table. Is there a way to use the part_date column as a partition decorator when loading in the data from a query result ?
INSERT INTO allows you to specify _PARTITIONTIME as a column, see link. Code below should work:
CREATE TABLE cat_dataset.cats_names(cat_name string)
PARTITION BY DATE(_PARTITIONTIME);
INSERT INTO cat_dataset.cats_names (_PARTITIONTIME, cat_name)
SELECT _PARTITIONTIME, cat_name
FROM `myproject.cat_dataset.cats`

How to apply Partition on hive table which is already partitioned

How to apply Partition on hive table which is already partitioned. I am not able to fetch the partitioned data into the folder after the data is loaded.
1st rule of partitioning in hive is that the partitionioning column should be the last column in the data. since the data is already partitioned lets say we are partitioning data on gender M/F there will be two directories gender=M and gender=F be created inside each of the directories respective gender data will be available and last column again in this data will be gender.
If you want to partiton data again on partitioned table use insert into select and make sure last column you use is the partition column you want to on the partitioned data.
Did you add a partition manually with the Hdfs command ? In that case metastore will not keep track of partitions being added unless you specify " alter table add partition "...
try this
MSCK REPAIR TABLE table_name;
If that is not the case , then try to drop partitions and create the partitions again . Use alter table command to do this. but you will lose the data . and your partitioning column value should be mentioned as last column in case if you are doing a dynamic partition insert.