Hive to BigQuery Converted INSERT OVERWRITE TABLE with PARTITION on Integer - google-bigquery

I am trying to convert the following Hive query to BigQuery with little luck. The idea is to remove the records from the specified partition and insert new records into the partition without touching other partitions. I have seen Google's documentation on using a DML statement to add rows to an ingestion-time partitioned table, but this isn't what I'm trying to accomplish.
INSERT OVERWRITE TABLE mytable PARTITION (integer_id = 100) select tmp.*, NULL as value from (select * from mytable2) as tmp;
Any help would be greatly appreciated!

Related

Listing all the partitions from BigQuery partitioned table with require_partition_filter

I am trying to find a way to list the partitions of a table created with require_partition_filter = true however I am not able to find the way yet.
This is table creation script
CREATE TABLE mydataset.partitionedtable_partitiontime
(
x INT64 \
)
PARTITION BY DATE(_PARTITIONTIME)
OPTIONS(
require_partition_filter = true
);
Some test rows
INSERT INTO mydataset.partitionedtable_partitiontime (_PARTITIONTIME, x) SELECT TIMESTAMP("2017-05-01"), 10;
INSERT INTO mydataset.partitionedtable_partitiontime (_PARTITIONTIME, x) SELECT TIMESTAMP("2017-04-01"), 20;
INSERT INTO mydataset.partitionedtable_partitiontime (_PARTITIONTIME, x) SELECT TIMESTAMP("2017-03-01"), 30;
As expected, If a try the following query to get the partitions, I am getting an error because I need to user a filter on top of the partitioning column
SELECT _PARTITIONTIME as pt, FORMAT_TIMESTAMP("%Y%m%d", _PARTITIONTIME) as partition_id
FROM `mydataset.partitionedtable_partitiontime`
GROUP BY _PARTITIONTIME
ORDER BY _PARTITIONTIME
Error
Cannot query over table 'mydataset.partitionedtable_partitiontime' without a filter over column(s) '_PARTITION_LOAD_TIME', '_PARTITIONDATE', '_PARTITIONTIME' that can be used for partition elimination
any ideas how to list the partitions?
EDIT: I know that it is possible to add the filter, but I am looking for a solution like "SHOW PARTITIONS TABLENAME" of Hive to list all the partitions (which are essentially metadata)
Thanks!
Here is the way to do it:
SELECT * FROM `mydataset.partitionedtable_partitiontime$__PARTITIONS_SUMMARY__`
The bigquery.jobs.create permission is required.
EDIT: Now is possible to get this information using Standard SQL:
SELECT * FROM `myproject.mydataset.INFORMATION_SCHEMA.PARTITIONS`
WHERE table_name = 'partitionedtable'
As mentioned by hlagos, you can get this data by querying the _PARTITIONTIME pseudo column, in case you are using Standard SQL, or the __PARTITIONS_SUMMARY__ meta table for Legacy SQL.
You can take a look on this GCP documentation that contains detailed information about the usage of this partitioned tables metadata.

Get actual target table insert count

I'm inserting data into hive external table in append mode. Every time I insert some records in a table, I want to get the count of actual records which are inserted into the hive external table. Is there any way I could find this information in any hive log file?
There can be workaround for this. Not sure about any hive property for this.
Have an additional timestamp column in your table.
Do self join on table on timestamp column.
count the latest records inserted into table. You can check below sample query:-
SELECT count(1) from (
SELECT tbl_alias.* FROM test_table tbl_alias JOIN
( select max(timestamp_date) as max_timestamp_date FROM test_table) max_timestamp_date_table ON
tbl_alias.timestamp_date=max_timestamp_date_table.max_timestamp_date ) outer_table;

hadoop hive insert query to insert all rows of one table to another table

i want to insert all rows of one hive table to another hive table
insert into table <table_name> as select * from <table_bkp>
i have many rows in table but it is inserting only one row from to
Please suggest the solution for it
and i am using hive 1.2.1 version
In your query remove 'as' and write the query as follows
insert into table <table_name> select * from <table_bkp>

How Insert a columns of unpartitoned table into a partitioned table in Hive?

A table 'A'is there which is partitioned. The another table 'B' is not partitioned . How to insert the values of B into A? Will error be thrown?
Yes, you can insert from a non-partitioned table to a partitioned table. You will either have to define the partition you want to insert into or have Hive do it dynamically.
For example, to dynamically insert into partitions, you could run something similar to:
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT INTO TABLE A PARTITION (partition) SELECT col1, col2, ..., colN, partition FROM B WHERE .... ;
More information regarding Hive Partitions with dynamic inserts can be found here : https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions. Take note, the last column in your SELECT is what is used for the partition insert. Another thing to note is that you need the number of columns to match between the two tables, otherwise you will have to fill in NULLs.

merging data from old table into new for a monthly archive

I have a sql statement to insert data into a table for archiving, but I need a merge statement to run on a monthyl basis to update the new table(2) with any data that changed in the old table(1) that should now be moved into archive.
Part of the issue is to remove the moved data from the old table. My insert is not doing that, but I need to have it to where the saved data is purged from the original table.
Is there a single sql statement that will move data out of one table into another in this way? Or does it need to be a two step operation?
the initial statement moved data depending on age and a few other relative factors.
insert is:
INSERT /*+ append */
INTO tab1
SELECT *
FROM tab2
WHERE (Postingdate < TO_DATE ('2001/07/01', 'yyyy/mm/dd')
OR jobname IS NULL)
AND STATUS <> '45';
All help appreciated...
The merge statement will let you do this in one statement by adding a delete statement in the update clause. See Oracle Documentation on Merge.
I think you should try this with a partition table. My idea is to create table which have range partition on date:
create table(id number primary key,name varchar,J_date date )
partition by range(J_date)(PARTITION one_mnth VALUES LESS THAN(sysdate-30)),
partition by range(J_date)(PARTITION one_mnth VALUES LESS THAN(maxvalue)));
then move that partition in to another table and and truncate that partition