I have written a select query in hive to move data to a particular folder.
But I am getting an error.
Please help.
Moving data to local directory /Dataproviders/DataSurgery/Order/out/jul24msngtxn/negtxns
Failed with exception Unable to move source hdfs://mycluster/tmp/hive/sshuser/253d3089-fcc0-4656-82ca-ccbe893196ed/hive_2018-08-16_06-58-29_220_388527949811395742-1/-mr-10000 to destination /Dataproviders/DataSurgery/Order/out/jul24msngtxn/negtxns
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
INSERT OVERWRITE LOCAL DIRECTORY '/Dataproviders/DataSurgery/Order/out/jul24msngtxn/negtxns/'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\034'
STORED AS TEXTFILE
select * from sourcetable;
I have given full permission to the following folders.
But still issue exists
hdfs dfs -chmod 777 /tmp/hive
hdfs dfs -chmod -R 777 /Dataproviders/DataSurgery/
I made a terrible mistake.
The keyword LOCAL should not be present to write into an hdfs directory.
I removed that and query worked fine.
Please find the correct query.
INSERT OVERWRITE DIRECTORY '/Dataproviders/DataSurgery/Order/out/jul24msngtxn/negtxns/'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\034'
STORED AS TEXTFILE
select * from sourcetable;
Related
This question may have been asked before, and I am relatively new to the HADOOP and HIVE language. So I'm trying to export content, as a test, to see if I am doing things correctly. The code is below.
Use MY_DATABASE_NAME;
INSERT OVERWRITE LOCAL DIRECTORY '/random/directory/test'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY "\n"
SELECT date_ts,script_tx,sequence_id FROM dir_test WHERE date_ts BETWEEN '2018-01-01' and '2018-01-02';
That is what I have so far, but then it generates multiple files and I want to combine them into a .csv file or a .xls file, to be worked on. My question, what do I do next to accomplish this?
Thanks in advance.
You can achieve by following ways:
Use single reducer in the query like ORDER BY <col_name>
Store to HDFS and then use command hdfs dfs –getmerge [-nl] <src> <localdest>
Using beeline: beeline --outputformat=csv2 -f query_file.sql > <file_name>.csv
Just a simple question, I'm new in Impala.
I want to load data from the HDFS to my datalake using impala.
So I have a csv this_is_my_data.csv and what I want to do is load the file without specify all the extension, I mean something like the following:
LOAD DATA INPATH 'user/myuser/this_is.* INTO TABLE my_table
This is, a string starting with this_is and whatever follows.
If you need some additional information, please let me know. Thanks in advance.
The documentation says:
You can specify the HDFS path of a single file to be moved, or the
HDFS path of a directory to move all the files inside that directory.
You cannot specify any sort of wildcard to take only some of the files
from a directory.
The workaround is to put your files into table directory using mv or cp command. Check your table directory using DESCRIBE FORMATTED command and run mv or cp command (in a shell, not Impala of course):
hdfs dfs -mv "user/myuser/this_is.*" "/user/cloudera/mytabledir"
Or put files you need to load into some directory first then load all the directory.
I dont have write permission on hdfs cluster.
I am accessing database tables created/stored on hdfs using hive via edge node.
I have read access.
I want to export data from tables located on hdfs into csv on my local system.
How should i do it?
insert overwrite local directory '/____/____/' row format delimited fields terminated by ',' select * from table;
Note that this may create multiple files and you may want to concatenate them on the client side after it's done exporting.
I am trying to copy data of a partitioned Hive table from one cluster to another.
I am using distcp to copy the data but the data underlying data is of a partitioned hive table.
I used the following command.
hadoop distcp -i {src} {tgt}
But as the table was partitioned the directory structure was created according to the partitioned tables. So it is showing error creating duplicates and aborting job.
org.apache.hadoop.toolsCopyListing$DulicateFileException: File would cause duplicates. Aborting
I also used -skipcrccheck -update -overwrite but none worked.
How to copy the data of a table from partitioned file path to destination?
Try to use this option -strategy dynamic
By default distcp is using uniformsize.
Check the below settings to see if they are false.Set them to true.
hive> set hive.mapred.supports.subdirectories;
hive.mapred.supports.subdirectories=false
hive> set mapreduce.input.fileinputformat.input.dir.recursive;
mapreduce.input.fileinputformat.input.dir.recursive=false
hadoop distcp -Dmapreduce.map.memory.mb=20480 -Dmapreduce.map.java.opts=-Xmx15360m -Dipc.client.fallback-to-simple-auth-allowed=true -Ddfs.checksum.type=CRC32C -m 500 \
-pb -update -delete {src} {target}
Ideally there can't be same file names. So, what's happening in your case is you trying to copy partitioned table from one cluster to other. And, 2 different named partitions have same file name.
Your solution is to correct Source path {src} in your command, such that you provide path uptil partitioned sub directory not the file.
For ex - Refer below :
/a/partcol=1/file1.txt
/a/partcol=2/file1.txt
If you use {src} as "/a/*/*" then you will get the error "File would cause duplicates."
But, if you use {src} as "/a" then you will not get error in copying.
I can export a hive query results using this:
INSERT OVERWRITE LOCAL DIRECTORY '/home/user/events/'
but if I want to export it to an HDFS dir at /user/events/
how do I do that? I tried this:
INSERT OVERWRITE DIRECTORY '/user/user/events/'
> row format delimited
> fields terminated by '\t'
> select * from table;
but get this error then:
FAILED: ParseException line 2:0 cannot recognize input near 'row' 'format' 'delimited' in statement
remove the LOCAL keyword - it specifies local filesystem. Without it the result will go to hdfs. You may actually need to use OVERWRITE though. So:
INSERT OVERWRITE DIRECTORY '/user/events/'