Hive unable to move query results to a folder - hive

I have written a select query in hive to move data to a particular folder.
But I am getting an error.
Please help.
Moving data to local directory /Dataproviders/DataSurgery/Order/out/jul24msngtxn/negtxns
Failed with exception Unable to move source hdfs://mycluster/tmp/hive/sshuser/253d3089-fcc0-4656-82ca-ccbe893196ed/hive_2018-08-16_06-58-29_220_388527949811395742-1/-mr-10000 to destination /Dataproviders/DataSurgery/Order/out/jul24msngtxn/negtxns
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
INSERT OVERWRITE LOCAL DIRECTORY '/Dataproviders/DataSurgery/Order/out/jul24msngtxn/negtxns/'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\034'
STORED AS TEXTFILE
select * from sourcetable;
I have given full permission to the following folders.
But still issue exists
hdfs dfs -chmod 777 /tmp/hive
hdfs dfs -chmod -R 777 /Dataproviders/DataSurgery/

I made a terrible mistake.
The keyword LOCAL should not be present to write into an hdfs directory.
I removed that and query worked fine.
Please find the correct query.
INSERT OVERWRITE DIRECTORY '/Dataproviders/DataSurgery/Order/out/jul24msngtxn/negtxns/'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\034'
STORED AS TEXTFILE
select * from sourcetable;

Related

Exporting Hive Table Data into .csv

This question may have been asked before, and I am relatively new to the HADOOP and HIVE language. So I'm trying to export content, as a test, to see if I am doing things correctly. The code is below.
Use MY_DATABASE_NAME;
INSERT OVERWRITE LOCAL DIRECTORY '/random/directory/test'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY "\n"
SELECT date_ts,script_tx,sequence_id FROM dir_test WHERE date_ts BETWEEN '2018-01-01' and '2018-01-02';
That is what I have so far, but then it generates multiple files and I want to combine them into a .csv file or a .xls file, to be worked on. My question, what do I do next to accomplish this?
Thanks in advance.
You can achieve by following ways:
Use single reducer in the query like ORDER BY <col_name>
Store to HDFS and then use command hdfs dfs –getmerge [-nl] <src> <localdest>
Using beeline: beeline --outputformat=csv2 -f query_file.sql > <file_name>.csv

LOAD DATA INPATH table files start with some string in Impala

Just a simple question, I'm new in Impala.
I want to load data from the HDFS to my datalake using impala.
So I have a csv this_is_my_data.csv and what I want to do is load the file without specify all the extension, I mean something like the following:
LOAD DATA INPATH 'user/myuser/this_is.* INTO TABLE my_table
This is, a string starting with this_is and whatever follows.
If you need some additional information, please let me know. Thanks in advance.
The documentation says:
You can specify the HDFS path of a single file to be moved, or the
HDFS path of a directory to move all the files inside that directory.
You cannot specify any sort of wildcard to take only some of the files
from a directory.
The workaround is to put your files into table directory using mv or cp command. Check your table directory using DESCRIBE FORMATTED command and run mv or cp command (in a shell, not Impala of course):
hdfs dfs -mv "user/myuser/this_is.*" "/user/cloudera/mytabledir"
Or put files you need to load into some directory first then load all the directory.

How do I export database tables data from hdfs into local csv using hive without write permission

I dont have write permission on hdfs cluster.
I am accessing database tables created/stored on hdfs using hive via edge node.
I have read access.
I want to export data from tables located on hdfs into csv on my local system.
How should i do it?
insert overwrite local directory '/____/____/' row format delimited fields terminated by ',' select * from table;
Note that this may create multiple files and you may want to concatenate them on the client side after it's done exporting.

Hadoop Distcp aborting when copying data from one cluster to another

I am trying to copy data of a partitioned Hive table from one cluster to another.
I am using distcp to copy the data but the data underlying data is of a partitioned hive table.
I used the following command.
hadoop distcp -i {src} {tgt}
But as the table was partitioned the directory structure was created according to the partitioned tables. So it is showing error creating duplicates and aborting job.
org.apache.hadoop.toolsCopyListing$DulicateFileException: File would cause duplicates. Aborting
I also used -skipcrccheck -update -overwrite but none worked.
How to copy the data of a table from partitioned file path to destination?
Try to use this option -strategy dynamic
By default distcp is using uniformsize.
Check the below settings to see if they are false.Set them to true.
hive> set hive.mapred.supports.subdirectories;
hive.mapred.supports.subdirectories=false
hive> set mapreduce.input.fileinputformat.input.dir.recursive;
mapreduce.input.fileinputformat.input.dir.recursive=false
hadoop distcp -Dmapreduce.map.memory.mb=20480 -Dmapreduce.map.java.opts=-Xmx15360m -Dipc.client.fallback-to-simple-auth-allowed=true -Ddfs.checksum.type=CRC32C -m 500 \
-pb -update -delete {src} {target}
Ideally there can't be same file names. So, what's happening in your case is you trying to copy partitioned table from one cluster to other. And, 2 different named partitions have same file name.
Your solution is to correct Source path {src} in your command, such that you provide path uptil partitioned sub directory not the file.
For ex - Refer below :
/a/partcol=1/file1.txt
/a/partcol=2/file1.txt
If you use {src} as "/a/*/*" then you will get the error "File would cause duplicates."
But, if you use {src} as "/a" then you will not get error in copying.

writing hive query results to hdfs

I can export a hive query results using this:
INSERT OVERWRITE LOCAL DIRECTORY '/home/user/events/'
but if I want to export it to an HDFS dir at /user/events/
how do I do that? I tried this:
INSERT OVERWRITE DIRECTORY '/user/user/events/'
> row format delimited
> fields terminated by '\t'
> select * from table;
but get this error then:
FAILED: ParseException line 2:0 cannot recognize input near 'row' 'format' 'delimited' in statement
remove the LOCAL keyword - it specifies local filesystem. Without it the result will go to hdfs. You may actually need to use OVERWRITE though. So:
INSERT OVERWRITE DIRECTORY '/user/events/'