Hive Query Searching Partition which doent exists - hive

I have one partitioned table in which I have one partition 1.e.030220. I want to insert this data into another table using insert/select before inserting data I'm Just selecting my data in that table and writing below query i.e.
SELECT col1,col2....partition_column(date1) FROM table_name;
but I'm getting error like ../user/hive/warehouse/...dbname.db/tablename/date1=040220 file doesn't exists.
I'm not sure why is searching that partition which is not available in my table. Can someone please suggest me what is wrong here.?

It seems, you have created 040220 partition earlier and later deleted it. In that scenario, run below query to repair your table.
MSCK REPAIR TABLE table_name;

Related

Using ingestion-time based pseudo-field (_PARTITIONTIME) as partition while clustering

I'd like to cluster our ingestion-time partitioned tables without having to change the ETL scripts we use to update them. All of our tables are partitioned on the pseudo-field _PARTITIONTIME, now when I try cluster a table with DML I get the following error:
Invalid field name "_PARTITIONTIME". Field names are not allowed to start with the (case-insensitive) prefixes _PARTITION, TABLE, FILE and _ROW_TIMESTAMP
Here's what the DML-script looks like:
CREATE TABLE `table_target`
PARTITION BY DATE(_PARTITIONTIME)
CLUSTER BY a, b, c
AS
SELECT
*, _PARTITIONTIME
FROM
`table_source`
How should I go about this? Is there a way to keep the same pseudo-field as the partition field, should I re-work the partition field, or am I missing something here?
It is Known limitation that:
It is not possible to create an ingestion-time partitioned table from the result of a query. Instead, use a CREATE TABLE DDL statement to create the table, and then use an INSERT DML statement to insert data into it.
In your case, you need to use CREATE TABLE to create target_table with CLUSTER BY first, then migrate data over.

Is it possible to change partition metadata in HIVE?

This is an extension of a previous question I asked: How to compare two columns with different data type groups
We are exploring the idea of changing the metadata on the table as opposed to performing a CAST operation on the data in SELECT statements. Changing the metadata in the MySQL metastore is easy enough. But, is it possible to have that metadata change applied to partitions (they are daily)? Otherwise, we might be stuck with current and future data being of type BIGINT while the historical is STRING.
Question: Is it possible to change partition meta data in HIVE? If yes, how?
You can change partition column type using this statement:
alter table {table_name} partition column ({column_name} {column_type});
Also you can re-create table definition and change all columns types using these steps:
Make your table external, so it can be dropped without dropping the data
ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='TRUE');
Drop table (only metadata will be removed).
Create EXTERNAL table using updated DDL with types changed and with the same LOCATION.
recover partitions:
MSCK [REPAIR] TABLE tablename;
The equivalent command on Amazon Elastic MapReduce (EMR)'s version of Hive is:
ALTER TABLE tablename RECOVER PARTITIONS;
This will add Hive partitions metadata. See manual here: RECOVER PARTITIONS
And finally you can make you table MANAGED again if necessary:
ALTER TABLE tablename SET TBLPROPERTIES('EXTERNAL'='FALSE');
Note: All commands above should be ran in HUE, not MySQL.
You can not change the partition column in hive infact Hive does not support alterting of partitioning columns
Refer : altering partition column type in Hive
You can think of it this way
- Hive stores the data by creating a folder in hdfs with partition column values
- Since if you trying to alter the hive partition it means you are trying to change the whole directory structure and data of hive table which is not possible
exp if you have partitioned on year this is how directory structure looks like
tab1/clientdata/2009/file2
tab1/clientdata/2010/file3
If you want to change the partition column you can perform below steps
Create another hive table with required changes in partition column
Create table new_table ( A int, B String.....)
Load data from previous table
Insert into new_table partition ( B ) select A,B from table Prev_table

What is an efficient way to bulk copy data from a CLOB column to a VARCHAR2 column in Oracle

I have a table TEST that has 41 million+ records in it.
I have two main columns in this table that I am interested in:
MESSAGE of type CLOB
MESSAGE_C of type VARCHAR2(2048)
The table Test is range partitioned using a partition column named PART_DATE where one partition has data for one day.
I tried using the below to get the job done:
ALTER TABLE TEST ADD MESSAGE_C VARCHAR2(2048);
UPDATE TEST SET MESSAGE_C = MESSAGE;
COMMIT;
ALTER TABLE TEST DROP COLUMN MESSAGE;
ALTER TABLE TEST RENAME COLUMN MESSAGE_C TO MESSAGE;
But I got stuck on step 2 for around 4 hours. Our DBA said, these was a blocking due to full table scans.
Can someone please tell me:
What would be a better/more efficient way to get this done?
Would using the PART_DATE field in the where clause of the update query help?
Consider using an INSERT INTO SELECT to create the new table on the fly with a new name, then add the indexes after creating the table, drop the old table, and rename the new table to the old name.
It's a DML operation, so it will be significantly faster, and also isn't slowed down by server logging settings.
I've used this approach to alter tables with 500 million records a bit recently.

how to convert a non-partitioned table into a partitioned one

How to rename a TABLE in Big query using StandardSQL or LegacySQL.
I'm trying with StandardSQL but it is giving following error,
RENAME TABLE dataset.old_table_name TO dataset.new_table_name;
Statement not supported: RenameStatement at [1:1]
Does it mean there is no any method(SQL QUERY) Which can rename a table?
I just want to change from non-partition table to partition-table
You can achieve this in two steps process
Step 1 - Export your table to Google Cloud Storage
Step 2 - Load file from GCS back to GBQ into new table with partitioned column
Both are free of charge
Still, have in mind some limitatins of partitioned tables - like number of partitions for example - it is 4000 per table as of today - https://cloud.google.com/bigquery/quotas#partitioned_tables
Currently it is not possible to rename table in Bigquery as explained in this document. You will have to create another table by following the steps given by Mikhail. Notice there is still some charge from table storage, but it is minimal. See this doc for detail information.
You can use the below query, it will create a new table with distinct records from old table with partition on given column.
create or replace table `dataset.new_table` PARTITION BY DATE(date_time_column) as select distinct * from `dataset.old_table`

Will hive dynamic partitioning update all partitions?

I want to use the hive dynamic partitioning to overwrite a partitioned table "page_view":
INSERT OVERWRITE TABLE page_view PARTITION(date)
SELECT pvs.viewTime FROM page_view_stg pvs
My question is : If the table "page_view_stg" only has the data of "date=2017-01-01", but the dest table "page_view" has a partition "date=2017-01-02". So after running this query, will the partition "date=2017-01-02" get dropped or not? If not, how should I handle this case using dynamic partitioning?
Thanks
Query with dynamic partitioning will overwrite only partitions existing in the source dataset. In your case partition "date=2017-01-02" will remain unchanged if the the source table does not contain such date. If you want to drop it, the fastest method is to execute alter table drop partition statement because this is the metadata operation. You select partitions from target table which do not exist in the source and generate drop statements using shell. Or insert into new table, drop old target, then rename.