aws glue drop partition using spark sql - amazon-emr

drop partition using spark sql frm glue metadata is throwing issues while same code works in hive shell.
**Hive shell**
hive> alter table prc_db.detl_stg drop IF EXISTS partition(prc_name="dq") ;
OK
Time taken: 1.013 seconds
**spark shell**
spark.sql(''' alter table prc_db.detl_stg drop IF EXISTS partition(prc_name="dq") ''') ;
Error message:
py4j.protocol.Py4JJavaError: An error occurred while calling o60.sql.
: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: InvalidObjectException(message:Unsupported expression (Service: AWSGlue; Status Code: 400; Error Code: InvalidInputException

If you are planning to drop the partition(Glue Catalogue) from spark shell then, you have to add the "" in the partition name. ex: spark.sql(""" alter table reinvent.newtesttable drop partition ( part= " 'part2' " ) """) for me it was spark.sql(''' alter table prc_db.detl_stg drop IF EXISTS partition(prc_name="'dq'") ''')

Related

Dynamically drop partition in hive sql

I need to drop data from the table which is older than 6 months, this needs to be part of a job and run everyday. I am using the below code
ALTER TABLE ab_test_cart_sbu_tableau_test_2 DROP IF EXISTS PARTITION (partition_day = add_months(current_date(),-6))
and getting the following error
Error: Error while compiling statement: FAILED: ParseException line
1:104 cannot recognize input near 'add_months' '(' 'current_date' in
constant (state=42000,code=40000)
ALTER TABLE ab_test_cart_sbu_tableau_test_2 DROP IF EXISTS PARTITION (partition_day = add_months(current_date(),-6))

Drop table only if it exists, or ignore drop error

I have a table MYLOG and would like to try drop it before creating it using the SQL script below.
If the table does not exist yet, the error below is throw.
How could I bypass this error if the table does not exist?
The schema gets set in an earlier script, which is not available in the SQL script:
set current schema MYSCHEMA
SQL script:
DROP TABLE MYLOG;
CREATE TABLE MYLOG (
TIME_STARTED TIMESTAMP NOT NULL,
USER_EMAIL VARCHAR(254) NOT NULL,
CONSTRAINT PK_TIME_STARTED_USER_EMAIL PRIMARY KEY (TIME_STARTED, USER_EMAIL)) ORGANIZE BY ROW;
COMMIT;
Error:
DROP TABLE MYLOG
SQLError: rc = 0 (SQL_SUCCESS)
SQLGetDiagRec: SQLState : S0002
fNativeError : -204
szErrorMsg : [IBM][CLI Driver][DB2/6000] SQL0204N "MYSCHEMA.MYLOG" is an undefined name. SQLSTATE=42704
This is a FAQ
There's more than one way to do it.
You can use compound-SQL in your script with a continue-handler for the SQLSTATE corresponding to the error you get if the table is not found, but this requires that you also use an alternative statement delimiter like shown below
--#SET TERMINATOR #
set current schema myschema#
BEGIN
DECLARE CONTINUE HANDLER FOR SQLSTATE '42704'
BEGIN end;
EXECUTE IMMEDIATE 'DROP TABLE MYLOG';
END #
CREATE TABLE MYLOG(... )#
You can also change the abort-on-first-error logic (if you use +s when running your script via the command line). You can udate the Db2 CLP options on the fly inside your script via update command options using s off (to continue on error) or update command options using s on to abort on error.
by using this query
select tabname from syscat.tables where
tabschema='myschema' and tabname='MYLOG'
check that table in your schema
if exist then
drop table myschema.MYLOG
then create

How to truncate a partitioned external table in hive?

I'm planning to truncate the hive external table which has one partition. So, I have used the following command to truncate the table :
hive> truncate table abc;
But, it is throwing me an error stating : Cannot truncate non-managed table abc.
Can anyone please suggest me out regarding the same ...
Make your table MANAGED first:
ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='FALSE');
Then truncate:
truncate table abc;
And finally you can make it external again:
ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='TRUE');
By default, TRUNCATE TABLE is supported only on managed tables. Attempting to truncate an external table results in the following error:
Error: org.apache.spark.sql.AnalysisException: Operation not allowed: TRUNCATE TABLE on external tables
Action Required
Change applications. Do not attempt to run TRUNCATE TABLE on an external table.
Alternatively, change applications to alter a table property to set external.table.purge to true to allow truncation of an external table:
ALTER TABLE mytable SET TBLPROPERTIES ('external.table.purge'='true');
There is an even better solution to this, which is basically a one liner.
insert overwrite table table_xyz select * from table_xyz where 1=2;
This code will delete all the files and create a blank file in the external folder location with absolute zero records.
Look at https://issues.apache.org/jira/browse/HIVE-4367 : use
truncate table my_ext_table force;

Hive Table is MANAGED or EXTERNAL - issue post table type conversion

I have a hive table in XYZ db named ABC.
When I run describe formatted XYZ.ABC; from hue, I get the following..
that is
Table Type: MANAGED_TABLE
Table Parameters: EXTERNAL True
So is this actually an external or a managed/internal hive table?
This is treated as an EXTERNAL table. Dropping table will keep the underlying HDFS data. The table type is being shown as MANAGED_TABLE since the parameter EXTERNAL is set to True, instead of TRUE.
To fix this metadata, you can run this query:
hive> ALTER TABLE XYZ.ABC SET TBLPROPERTIES('EXTERNAL'='TRUE');
Some details:
The table XYZ.ABC must have been created via this kind of query:
hive> CREATE TABLE XYZ.ABC
<additional table definition details>
TBLPROPERTIES (
'EXTERNAL'='True');
Describing this table will give:
hive> desc formatted XYZ.ABC;
:
Location: hdfs://<location_of_data>
Table Type: MANAGED_TABLE
:
Table Parameters:
EXTERNAL True
Dropping this table will keep the data referenced in Location in describe output.
hive> drop table XYZ.ABC;
# does not drop table data in HDFS
The Table Type still shows as MANAGED_TABLE which is confusing.
Making the value for EXTERNAL as TRUE will fix this.
hive> ALTER TABLE XYZ.ABC SET TBLPROPERTIES('EXTERNAL'='TRUE');
Now, doing a describe will show it as expected:
hive> desc formatted XYZ.ABC;
:
Location: hdfs://<location_of_data>
Table Type: EXTERNAL_TABLE
:
Table Parameters:
EXTERNAL TRUE
Example -
Lets create a sample MANAGED table,
CREATE TABLE TEST_TBL(abc int, xyz string);
INSERT INTO TABLE test_tbl values(1, 'abc'),(2, 'xyz');
DESCRIBE FORMATTED test_tbl;
Changing type to EXTERNAL (in the wrong way using True, instead of TRUE):
ALTER TABLE test_tbl SET TBLPROPERTIES('EXTERNAL'='True');
This gives,
Now lets DROP the table,
DROP TABLE test_tbl;
The result:
Table is dropped but data on HDFS isn't. Showing correct external table behavior!
If we re-create the table we can see data exists:
CREATE TABLE test_tbl(abc int, xyz string);
SELECT * FROM test_tbl;
Result:
The describe shows it wrongly as MANAGED TABLE along with EXTERNAL True because of:
.equals check in the meta
Hive Issue JIRA: HIVE-20057
Proposed fix: Use case insensitive equals

Hive Alter table change Column Name

I am trying to rename a columnName in Hive. Is there a way to rename column name in Hive .
tableA (column1 ,_c1,_c2)
to
tableA(column1,column2,column3)
??
Change Column Name/Type/Position/Comment:
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]
Example:
CREATE TABLE test_change (a int, b int, c int);
// will change column a's name to a1
ALTER TABLE test_change CHANGE a a1 INT;
Command works only if "use" -command has been first used to define the database where working in. Table column renaming syntax using DATABASE.TABLE throws error and does not work. Version: HIVE 0.12.
EXAMPLE:
hive> ALTER TABLE databasename.tablename CHANGE old_column_name new_column_name;
MismatchedTokenException(49!=90)
at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
at org.apache.hadoop.hive.ql.parse.HiveParser.alterStatementSuffixExchangePartition(HiveParser.java:11492)
...
hive> use databasename;
hive> ALTER TABLE tablename CHANGE old_column_name new_column_name;
OK
alter table table_name change old_col_name new_col_name new_col_type;
Here is the example
hive> alter table test change userVisit userVisit2 STRING;
OK
Time taken: 0.26 seconds
hive> describe test;
OK
uservisit2 string
category string
uuid string
Time taken: 0.213 seconds, Fetched: 3 row(s)
In the comments #libjack mentioned a point which is really important. I would like to illustrate more into it. First, we can check what are the columns of our table by describe <table_name>; command.
there is a double-column called _c1 and such columns are created by the hive itself when we moving data from one table to another. To address these columns we need to write it inside backticks
`_c1`
Finally, the ALTER command will be,
ALTER TABLE <table_namr> CHANGE `<system_genarated_column_name>` <new_column_name> <data_type>;