NO data for the hive table created from avsc schema - hive

I imported data from mysql to hdfs as avroformat. I moved the .avsc file to hdfs and creating hive table by using the .avsc file. Please see the table below:
.AVSC file : -rw-r--r-- 3 jonnavithulasivakrishna hdfs 1041 2017-09-13 00:05 hdfs://nn01.itversity.com:8020/user/jonnavithulasivakrishna/products.avsc
Table created :
hive (siv_sqoop_import)> CREATE EXTERNAL TABLE Products_1
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> location '/user/jonnavithulasivakrishna/products'
> TBLPROPERTIES('avro.schema.url'='hdfs://nn01.itversity.com:8020/user/jonnavithulasivakrishna/products.avsc');
OK
Time taken: 0.155 seconds
hive (siv_sqoop_import)> select * from Products_1 limit 10;
OK
Time taken: 0.294 seconds
As you see, it has created with no data.Could you please help me why I'm not getting data in this table.

Please find the sqoop command below :
sqoop import \
--connect "jdbc:mysql://nn01.itvserity.com:3306/retail_db" \
--username retail_dba -P \
--table products \
--as-avrodatafile \
--num-mappers 6 \
--target-dir "/user/jonnavithulasivakrishna/products" \

Related

SQL queries in BASH script

I am using Hadoop to execute my queries.
What I want is using BASH variables within my query. Here is an example :
export month="date +%m"
export year="date +%Y"
beeline -u jdbc:hive2://clustername.azurehdinsight.net:443/tab'
-n myname -e "select * from mytable where month = '$month' and
year = '$year';"
But the query is empty so that in reality, it's not the case within Hive.
select * from mytable where month = '$month' and
year = '$year';
is not an empty query in Hive.
Is there a problem in my bash script ?
You need execute date command using $(), change
export month="date +%m"
export year="date +%Y"
with
export month=$(date +%m)
export year=$(date +%Y)
You can use hivevar arguments with beeline
beeline -u jdbc:hive2://clustername.azurehdinsight.net:443/tab \
-n myname \
--hivevar month=$month \
--hivevar year=$year \
-e "select * from mytable where month = '${hivevar:month}' and year = '${hivevar:year}';"

unable to sqoop data from oracle data source using split by on date column

query:
sqoop import --connect "*****" \
--username ****
-P ******
--query "select * from table_name where trunc(date_column)>=ADD_MONTHS(TRUNC(sysdate,'YEAR'),-12) and \$CONDITIONS" \
--split-by date_column \
-m 4
error:
error Image
the error is caused by java.sql.SQL.DataException: ORA-01861: literal does not match string format. Please find the attached image for more description of error.
Looks like an issue with the format of the split-by column. Try formatting the split-by column as: --split-by "to_number(to_char(date_column, 'YYYYMMDDHHMISS'))"
or to whatever format is required.

Sqoop Incremental Import Error using Date Column

I'm trying to incrementally import using sqoop . It is working for the column id but when I'm trying to do the same thing in date column it is showing an error , the date column in mysql is in varchar format .
code :
sqoop import --connect jdbc:mysql://$hostName/$dbName --username $userName --password $pass --query "select * , 'username' as user_name, 'date' as created_date from $tableName WHERE $cloumnName between ${values[0]} and ${values[1]} AND \$CONDITIONS" --target-dir outputPath --append --m mapperNo --split-by splitByValue --check-column cloumnName --incremental $imode
Error :
16/12/20 01:18:46 ERROR tool.ImportTool: Error during import: Character column (date_field) can not be used to determine which rows to incrementally import.

how to delete partitions from hive table dynamically?

I am new to hive. Can someone please help me with this requirement?
My requirement is to delete partitions dynamically. I had a SQL which results various regions (SQL is below: after ALTER TABLE FROM). Now I want to delete the regions (partitioned in my hive table) that are returned by my SQL.
I tried in the below way:
ALTER TABLE <TableName> PARTITION(region=tab.region)
FROM
select tab.region from
(SELECT * from Table1) tab join
(select filename from Table2) tab1
on tab1.filename = tab.filename
It's throwing the below exception:
'1:21:13 [ALTER - 0 row(s), 0.008 secs] [Error Code: 40000, SQL State: 42000] Error while compiling statement: FAILED: ParseException
line 1:77 cannot recognize input near 'tab' '.' 'region' in constant
... 1 statement(s) executed, 0 row(s) affected, exec/fetch time: 0.008/0.000 sec [0 successful, 0 warnings, 1 errors]
Could someone help me please?
Thanks in Advance
Shell-script:
$ cat test
#!/bin/sh
IFS=$'\n'
#get partitions
part=`hive -e "select tab.region from (SELECT * from Table1) tab join (select filename from Table2) tab1 on tab1.filename = tab.filename"`
for p in $part
do
partition=`echo $p|cut -d '=' -f2`
echo Drooping partitions .... $partition
#drop partitions
hive -e "ALTER TABLE test_2 DROP PARTITION(region=\"$partition\")"
done
output:
$ ./test
OK
Time taken: 1.343 seconds, Fetched: 2 row(s)
Drooping partitions .... 2016-07-26 15%3A00%3A00
Dropped the partition region=2016-07-26 15%3A00%3A00
OK
Time taken: 2.686 seconds
Drooping partitions .... 2016-07-27
Dropped the partition region=2016-07-27
OK
Time taken: 1.612 seconds
update: ~~> Running shell script from hql (This is just a POC, make changes according to your req.)
using ! <command> - Executes a shell command from the Hive shell.
test_1.sh:
#!/bin/sh
echo " This massage is from $0 file"
hive-test.hql:
! echo showing databases... ;
show databases;
! echo showing tables...;
show tables;
! echo runing shell script...;
! /home/cloudera/test_1.sh
output:
$ hive -v -f hive-test.hql
showing databases...
show databases
OK
default
retail_edw
sqoop_import
Time taken: 0.997 seconds, Fetched: 3 row(s)
showing tables...
show tables
OK
scala_departments
scaladepartments
stack
stackover_hive
Time taken: 0.062 seconds, Fetched: 4 row(s)
runing shell script...
This massage is from /home/cloudera/test_1.sh file

Extra backslash in Sqoop import result

Currently i'm using Sqoop to import data from HP Vertica database to Hive, for some column with special character, the result is different from the data in the Vertica DB, here is the code:
sqoop import --driver com.vertica.jdbc.Driver --connect jdbc:vertica://db.foo.com/corp \
--username xx --P --where 'SRC_SYS_CD=xxx' --null-string '\\N' --null-non-string '\\N' \
--m 1 --fields-terminated-by '\001' --hive-drop-import-delims --table addr \
--target-dir /xxxx/addr
Data in vertica DB:
SRC_SYS_CD CTRY_CD ADDR_ID ADDR_TYP_CD ADDR_STR_1_LG_NM
123456 NZ 107560 NULL C\ - 108 Waiatarua Road
Data showed in Hive DB:
SRC_SYS_CD CTRY_CD ADDR_ID ADDR_TYP_CD ADDR_STR_1_LG_NM
123456 NZ 107560 NULL C\\ - 108 Waiatarua Road
The only difference is in column ADDR_STR_1_LG_NM, which after sqoop importing, one backslash(\) was added. While other column that does not have a backslash (\) was not changed.
Since there is NULL in vertica, we must use --null-string '\\N' --null-non-string '\\N'.
I've tried some other options like:
--escaped-by \\ --optionally-enclosed-by '\"'
But that doesn't work.
For DBs that sqoop supports direct connect, use --direct and remove --hive-drop-import-delims will import data as-is.
This link lists DB that sqoop supported
While i've confirmed Vertica is supported with direct connect but not listed.