Is there a workaround to my attempted Hive insert - hive

I copy the structure of schema2.card_master over to schema1.card_master using
hive> create table schema1.card_master like schema2.card_master;
That works, and it is partitioned as was the original on a field. This new table has hundreds of fields so they are inconvenient to list out, but I want all the fields populated from the original table using a Join filter. Now I want to populate it using a JOIN:
hive> insert overwrite table schema1.card_master (select * from schema2.card_master ccm INNER JOIN schema1.accounts da on ccm.cm13 = da.cm13);
FAILED: SemanticException 1:23 Need to specify partition columns because the destination table is partitioned. Error encountered near token 'cmdl_card_master'
I checked the partition that was copied over, and it was a field mkt_cd that could take on 2 values, US or PR.
So I try
hive> insert overwrite table schema1.card_master PARTITION (mkt_cd='US') (select * from schema2.card_master ccm INNER JOIN schema1.accounts da on ccm.cm13 = da.cm13);
FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different ''US'': Table insclause-0 has 255 columns, but query has 257 columns.
hive>
What is going on here? Is there any work around to load my data without having to explicitly mention all the fields in the Select statement for schema2.card_master ?

select * selects columns from each table in a join. Use select ccm.* instead of select * to select columns from ccm table only. Also remove static partition specification ('US'), use dynamic instead, because ccm.* contains partition column, and when you are loading static partition you should not have partition column in the select.
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table schema1.card_master partition(mkt_cd) --dynamic partition
select ccm.* --use alias
from schema2.card_master ccm
INNER JOIN schema1.accounts da on ccm.cm13 = da.cm13
;

Related

Table to table insert w/o duplicates in hive

I have table A as truncate and load for every month file and table B will be append
So table A will be file to table in hive
Table B will be tableA Insert and append data
Issue here is table B is straight move select stmt from table A , and chances are it can be inserted with duplicate/ same data
How should I write a select query to insert data from Table A
Both tables will have file-date as the column
Left join A and B is giving wrong counts in this insert tables
And hive is not working for not exists code
Issue Is:
Append table script : partitioned by yearmonth
Insert into table dist.t2
Select
Person_sk,
Np_id,
Yearmonth,
Insert_date
File_date
From table raw.ma
Data in Table raw.ma —this is truncate and reload
File1 data:201902
File2data:201903
File3data:201904
File4data: if 201902 data gets loaded to table — this should not duplicate the file1 data.. it should either not get inserted or should overwrite that partition
Here I need a filter or where condition to append data into dist.t2
Can you please help with this ??
I tried alter drop table partition in hive, but it’s failing in the spark framework
Please help with avoiding duplicate entries insert

Create Temporary Table with Select and Values

I'm trying to create a temporary table in Hive as follows:
CREATE TEMPORARY TABLE mydb.tmp2
AS SELECT * FROM (VALUES (0, 'abc'))
AS T (id , mystr);
But that gives me the following error:
SemanticException [Error 10296]: Values clause with table constructor not yet supported
Is there another way to create a temporary table by explicitly and directly providing the values in the same command?
My ultimate goal is to run a MERGE command, and the temporary table would be inserted after the USING command. So something like this:
MERGE INTO mydb.mytbl
USING <temporary table>
...
Use subquery instead of temporary table:
MERGE INTO mydb.mytbl t
USING (SELECT 0 as id, 'abc' as mystr) tmp on tmp.id = t.id
Hive does not support values constructor yet. You can achieve this using below query:
CREATE TEMPORARY TABLE mydb.tmp2
AS SELECT 0 as id, 'abc' as mystr;
For merge, you can use temporary table as below:
merge into target_table
using ( select * from mydb.tmp2) temp
on temp.id = target_table.id
when matched then update set ...
when not matched then insert values (...);

Can I use PARTITIONED BY after the table has been created?

create table t1 as select * from t2 where 1=2;
I am using the above code to create a table t1 from table t2. In this table t2 is partitioned on 3 vaules, i.e. month, day, year. Once the table t1 is created it is not partitioned on the values mentioned above.
I have tried the below code but it is giving me errors. Help!
create table t1 as
select * from t2 PARTITIONED BY( YEAR STRING, MONTH STRING, DAY STRING);
[42000]: Error while compiling statement: FAILED: ParseException line 1:0 cannot recognize input near 'PARTITIONED' 'BY' '(' in table source
Just need to correct the syntax. partitioned by ... goes after create table.
create table t1 PARTITIONED BY(YEAR STRING,MONTH STRING,DAY STRING) as
select /*add other columns here*/,year,month,day
from t2;
It is suggested to explicitly call out the columns instead of * and specify the partitioning columns towards the end of select.
The above answer is right, solution for creating partition at/during the time of table creation.
In-case table already created without partition, then one of ways is using INSERT OVERWRITE.
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
INSERT OVERWRITE TABLE <table_name> PARTITION(<partition_name>)
SELECT <column_1,... column_n, partition_name> from <table_name>;

Get actual target table insert count

I'm inserting data into hive external table in append mode. Every time I insert some records in a table, I want to get the count of actual records which are inserted into the hive external table. Is there any way I could find this information in any hive log file?
There can be workaround for this. Not sure about any hive property for this.
Have an additional timestamp column in your table.
Do self join on table on timestamp column.
count the latest records inserted into table. You can check below sample query:-
SELECT count(1) from (
SELECT tbl_alias.* FROM test_table tbl_alias JOIN
( select max(timestamp_date) as max_timestamp_date FROM test_table) max_timestamp_date_table ON
tbl_alias.timestamp_date=max_timestamp_date_table.max_timestamp_date ) outer_table;

Drop Column from Selection After Join

I am trying to drop a number of columns after joining three tables. I came across a very useful SO post: SQL exclude a column using SELECT * [except columnA] FROM tableA?
However I can't get the solution I found therein (the top answer) to work for my case;
joining multiple tables, and want to drop the key that is used for the second and third tables, as well as some other columns.
Here is a simplified case whereby I'm just attempting to drop the key of the second table, which comes out as mykey_2:
SELECT * INTO Temptable
FROM
table_1 INNER JOIN table_2 ON
table_1.mykey=table_2.mykey INNER JOIN
table_3 ON table_1.mykey= table_3.mykey
WHERE table_1.A_FIELD = 'Some Selection'
ALTER TABLE Temptable
DROP COLUMN table_2.mykey
GO
SELECT * FROM Temptable
DROP TABLE Temptable
My console is giving me the error "ORA-00933: SQL command not properly ended".
How can I achieve this dropping of specific columns from the final selection?
For information I'm querying an Oracle database using Toad.