how to delete partitions from hive table dynamically? - sql

I am new to hive. Can someone please help me with this requirement?
My requirement is to delete partitions dynamically. I had a SQL which results various regions (SQL is below: after ALTER TABLE FROM). Now I want to delete the regions (partitioned in my hive table) that are returned by my SQL.
I tried in the below way:
ALTER TABLE <TableName> PARTITION(region=tab.region)
FROM
select tab.region from
(SELECT * from Table1) tab join
(select filename from Table2) tab1
on tab1.filename = tab.filename
It's throwing the below exception:
'1:21:13 [ALTER - 0 row(s), 0.008 secs] [Error Code: 40000, SQL State: 42000] Error while compiling statement: FAILED: ParseException
line 1:77 cannot recognize input near 'tab' '.' 'region' in constant
... 1 statement(s) executed, 0 row(s) affected, exec/fetch time: 0.008/0.000 sec [0 successful, 0 warnings, 1 errors]
Could someone help me please?
Thanks in Advance

Shell-script:
$ cat test
#!/bin/sh
IFS=$'\n'
#get partitions
part=`hive -e "select tab.region from (SELECT * from Table1) tab join (select filename from Table2) tab1 on tab1.filename = tab.filename"`
for p in $part
do
partition=`echo $p|cut -d '=' -f2`
echo Drooping partitions .... $partition
#drop partitions
hive -e "ALTER TABLE test_2 DROP PARTITION(region=\"$partition\")"
done
output:
$ ./test
OK
Time taken: 1.343 seconds, Fetched: 2 row(s)
Drooping partitions .... 2016-07-26 15%3A00%3A00
Dropped the partition region=2016-07-26 15%3A00%3A00
OK
Time taken: 2.686 seconds
Drooping partitions .... 2016-07-27
Dropped the partition region=2016-07-27
OK
Time taken: 1.612 seconds
update: ~~> Running shell script from hql (This is just a POC, make changes according to your req.)
using ! <command> - Executes a shell command from the Hive shell.
test_1.sh:
#!/bin/sh
echo " This massage is from $0 file"
hive-test.hql:
! echo showing databases... ;
show databases;
! echo showing tables...;
show tables;
! echo runing shell script...;
! /home/cloudera/test_1.sh
output:
$ hive -v -f hive-test.hql
showing databases...
show databases
OK
default
retail_edw
sqoop_import
Time taken: 0.997 seconds, Fetched: 3 row(s)
showing tables...
show tables
OK
scala_departments
scaladepartments
stack
stackover_hive
Time taken: 0.062 seconds, Fetched: 4 row(s)
runing shell script...
This massage is from /home/cloudera/test_1.sh file

Related

BigQuery table not found error on location

i'm trying to run below script but i keep getting error that the dataset isn't found. The problem is caused by $date on the Select Query, how do I fix this?
My goal is to copy the table from another dataset with data matching based on the date.
#!/bin/bash
date="20160805"
until [[ $date > 20160807 ]];
do
bq query --use_legacy_sql=false --destination_table="google_analytics.ga_sessions_${date}" 'SELECT g.* FROM `10241241.ga_sessions_$date` g, UNNEST (hits) as hits where hits.page.hostname="www.googlemerchandisestore.com" '
date=$(date +'%Y%m%d' -d "$date + 1 day")
done
Below error:
BigQuery error in query operation: Error processing job 'test-247020:bqjob_r6a2d68fbc6d04a34_000001722edd8043_1': Not found: Table test-247020:10241241.ga_sessions_ was not found in location EU
BigQuery error in query operation: Error processing job 'test-247020:bqjob_r5c42006229434f72_000001722edd85ae_1': Not found: Table test-247020:10241241.ga_sessions_ was not found in location EU
BigQuery error in query operation: Error processing job 'test-247020:bqjob_r6114e0d3e72b6646_000001722edd8960_1': Not found: Table test-247020:10241241.ga_sessions_ was not found in location EU
The problem is that you are using single quotes for the query, therefore bash doesn't replace $date with value of parameter. You need to keep double quotes for the query string:
date="20160805"
until [[ $date > 20160807 ]];
do
bq query --use_legacy_sql=false --destination_table="google_analytics.ga_sessions_${date}" "SELECT g.* FROM \`10241241.ga_sessions_$date\` g, UNNEST (hits) as hits where hits.page.hostname=\"www.googlemerchandisestore.com\" "
date=$(date +'%Y%m%d' -d "$date + 1 day")
done

Google BigQuery: bq query from command line fails, although it runs fine from the UI Query Editor

When I run the following query from the BQ UI, it runs fine and gives the desired output.
But when I run the same from the command line, it gives the following error
bq query --destination_table
Chicago_Traffic_Sensor_Data.Latest_Traffic_Data_with_Geocoding
--replace --use_legacy_sql=false 'SELECT segmentid, _lif_lat, start_lon, _lit_lat, _lit_lon, _traffic, _last_updt, CASE WHEN
_traffic < 20 THEN '#FF0000' WHEN _traffic >= 20 and _traffic < 40 THEN '#FFFF00' WHEN _traffic >= 40 THEN '008000' ELSE '#666666' END as
strokeColor FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY segmentid
ORDER BY _last_updt DESC) col FROM
Chicago_Traffic_Sensor_Data.traffic_sensor_data) x WHERE x.col = 1
ORDER BY segmentid' Error in query string: Error processing job
'bedrock-gcp-testing:bqjob_r5d944587b08c4e54_000001626fa3f61d_1':
Syntax error: Unexpected end of statement at [1:412]
You need to escape any characters in your SQL that the command line is trying to interpret. What I find much easier & quicker is to put my SQL in a file, and pipe it instead. For example:
bq query --destination_table grey-sort-challenge:partitioning_magic.foobarred --use_legacy_sql=false "$(cat data.sql)"

Pig - reading Hive table stored as Avro

I have created a hive table stored with Avro file format. I am trying to load same hive table using below Pig commands
pig -useHCatalog;
hive_avro = LOAD 'hive_avro_table' using org.apache.hive.hcatalog.pig.HCatLoader();
I am getting " failed to read from hive_avro_table " error when I tried to display "hive_avro" using DUMP command.
Please help me how to resolve this issue. Thanks in advance
create table hivecomplex
(name string,
phones array<INT>,
deductions map<string,float>,
address struct<street:string,zip:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '$'
MAP KEYS TERMINATED BY '#'
STORED AS AVRO
;
hive> select * from hivecomplex;
OK
John [650,999,9999] {"pf":500.0} {"street":"pleasantville","zip":88888}
Time taken: 0.078 seconds, Fetched: 1 row(s)
Now for the pig
pig -useHCatalog;
a = LOAD 'hivecomplex' USING org.apache.hive.hcatalog.pig.HCatLoader();
dump a;
ne.util.MapRedUtil - Total input paths to process : 1
(John,{(650),(999),(9999)},[pf#500.0],(pleasantville,88888))

executing HIVE query in background

how to execute a HIVE query in background when the query looks like below
Select count(1) from table1 where column1='value1';
I am trying to write it using a script like below
#!/usr/bin/ksh
exec 1> /home/koushik/Logs/`basename $0 | cut -d"." -f1 | sed 's/\.sh//g'`_$(date +"%Y%m%d_%H%M%S").log 2>&1
ST_TIME=`date +%s`
cd $HIVE_HOME/bin
./hive -e 'SELECT COUNT(1) FROM TABLE1 WHERE COLUMN1 = ''value1'';'
END_TIME=`date +%s`
TT_SECS=$(( END_TIME - ST_TIME))
TT_HRS=$(( TT_SECS / 3600 ))
TT_REM_MS=$(( TT_SECS % 3600 ))
TT_MINS=$(( TT_REM_MS / 60 ))
TT_REM_SECS=$(( TT_REM_MS % 60 ))
printf "\n"
printf "Total time taken to execute the script="$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS
printf "\n"
but getting error like
FAILED: SemanticException [Error 10004]: Line 1:77 Invalid table alias or column reference 'value1'
let me know exactly where I am doing mistake.
Create a document named example
vi example
Enter the query in the document and save it.
create table sample as
Select count(1) from table1 where column1='value1';
Now run the document using the following command:
hive -f example 1>example.error 2>example.output &
You will get the result as
[1]
Now disown the process :
disown
Now the process will run in the background. If you want to know the status of the output, you may use
tail -f example.output
True #Koushik ! Glad that you found the issue.
In the query, bash was unable to form the hive query due to ambiguous single quotes.
Though SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1' is valid in hive,
$hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1';' is not valid.
The best solution would be to use double quotes for the Value1 as
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'
or use a quick and dirty solution by including the single quotes within double quotes.
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "'"Value1"'";'
This would make sure that the hive query is properly formed and then executed accordingly. I'd not suggest this approach unless you've a desperate ask for a single quote ;)
I am able to resolve it replacing single quote with double quote. Now the modified statement looks like
./hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'

How to delete table entry from hive metastore when underlying hdfs file disappeared

There is a table for which the backing hdfs file no longer exists. Now the problem is that the "drop table" command fails:
Failed to load metadata for table: db.mytable
Caused by TAbleLoadingException: Failed to load metadata for table: db.mytable
File does not exist: hdfs://....
Caused by FileNotFoundException: File does not exist: hdfs:// ..
You can change the location to something valid and then delete it.
alter table mytable set location 'hdfs://valid/path';
drop table mytable;
Here is an example
root#*****]# /opt/hive/bin/hive -e "create external table test (a string) ";
OK
Time taken: 0.822 seconds
[root#*****]# /opt/hive/bin/hive -e "desc extended test";
OK
a string
Detailed Table Information Table(tableName:test, dbName:default, owner:root, createTime:1459611644, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:a, type:string, comment:null)], location:hdfs://ec2-23-20-175-171.compute-1.amazonaws.com:8020/user/hive/warehouse/test, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1459611644}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE)
Time taken: 0.587 seconds, Fetched: 3 row(s)
Destroy hadoop cluster then attempt to drop table ...
[root#*****]# /opt/hive/bin/hive -e "drop table test";
Command hangs with following in log:
2016-04-02 11:44:33,677 INFO [pool-4-thread-3]: ipc.Client (Client.java:handleConnectionFailure(666)) - Retrying connect to server: ec2-23-20-175-171.compute-1.amazonaws.com/23.20.175.171:8020. Already tried 3 time(s).
Set location to something valid
[root#*****]# /opt/hive/bin/hive -e "alter table test set location 's3n://*****/test'";
OK
Time taken: 0.807 seconds
Attempt drop table again
[root#*****]# /opt/hive/bin/hive -e "drop table test";
OK
Time taken: 1.097 seconds