loading data to hive table using file in local file system - hive

While loading file to hive table it's giving "unable to move source file" .I'm using cloudera virtual box.
Attaching scrrenshot of error , file which I'm uploading and hive table schema
2)
3)

Related

How to load multiple files (same schema) from LOCAL into a table in BigQuery?

I have multi-files in a folder in my local machine. And every file has the same schema. how can I upload those files into BQ in one line cli?
I tried this:
bq load --source_format=NEWLINE_DELIMITED_JSON --ignore_unknown_values temp.test_load ./* ./schema.json
But I got this error
Too many positional args, still have...(and continues with the name of all files in the folder)
But when I specife the file name, it uploads it into BQ without any error:
bq load --source_format=NEWLINE_DELIMITED_JSON --ignore_unknown_values temp.test_load ./file_1.ndjson.gz ./schema.json (this one is working)
How can I make multi uploads from local?
When loading data from local files to BigQuery, files can only be loaded individually as the Wildcards and comma-separated lists are not supported for loading data from local files.
The Wildcards are only supported if you are loading data from Cloud Storage to BigQuery, provided all the files share a common base-name.

Output of Hive extract to local drive error

i have created a directory in hdfs as Test, but when i trying to insert records
from hive to the hdfs it is throwing error as
Error: WARNING:root:could not open file '/etc/apt/sources.list.d/dotnetdev.list'
WARNING:root:could not open file '/etc/apt/sources.list.d/HDP.list'
INSERT: command not found
Command i used
INSERT OVERWRITE DIRECTORY '/Test' select * from table limit 10
Please help

HIVE> FAILED: SemanticException Line 1:23 Invalid path

I tired to load the data into my table 'users' in LOCAL mode and i am using cloudera on my virtual box. I have a file placed my file inside /home/cloudera/Desktop/Hive/ directory but i am getting an error
FAILED: SemanticException Line 1:23 Invalid path ''/home/cloudera/Desktop/Hive/hive_input.txt'': No files matching path file:/home/cloudera/Desktop/Hive/hive_input.txt
My syntax to load data into table
Load DATA LOCAL INPATH '/home/cloudera/Desktop/Hive/hive_input.txt' INTO Table users
Yes I removed the Local as per #Bhaskar, and path is my HDFS path where file exists not underlying linux path.
Load DATA INPATH '/user/cloudera/input_project/' INTO Table users;
You should change permission on the folder that contains your file.
chmod -R 755 /home/user/
Another reason could be the file access issue. If you are running hive CLI from user01 and accessing a file (your INPATH) from user02 home directory, it will give you the same error.
So the solution could be
1. Move the file to a location where user01 can access the file.
OR
2. Relaunch the Hive CLI after logging in with user02.
Check if you are using a Sqoop import in your script, try to import data to hive from an empty table.
This may cause the scoop import to delete the HDFS location of the hive table.
to confirm run: hdfs dfs -ls before and after you execute the sqoop import, re-create the directory using hdfs dfs -mkdir
My path to the file in HDFS was data/file.csv, note, it is not /data/file.csv.
I specified the LOCATION during table creation as data/file.csv.
Executing
LOAD DATA INPATH '/data/file.csv' INTO TABLE example_table;
failed with the mentioned exception. However, executing
LOAD DATA INPATH 'data/file.csv' INTO TABLE example_table;
worked as desired.

What is the path for a bootstrapped file for a Pig job running in Amazon EMR

I bootstrap a data file in my EMR job. The bootstrapping succeeds and the file is copied to /home/hadoop/contents/ folder with right permissions.
However when I try to access it in the Pig script like below:
userdidstopick = load '/home/hadoop/contents/UserIdsToPick.txt' AS (uid:chararray);
I get an error that the input path does not exist:
hdfs://10.183.166.176:9000/home/hadoop/contents/UserIdsToPick.txt
When running Ruby jobs the bootstrapped file was always accessible under /home/hadoop/contents/ folder and everything worked for me.
Is it different for Pig?
By default Pig on EMR is configured to access HDFS location instead of local filesystem. The error shows the HDFS location.
There are 2 ways to solve this:
Either copy the file on S3, and directly load file from s3
userdidstopick = load 's3_bucket_location/UserIdsToPick.txt' AS (uid:chararray);
Or you can first copy the file on HDFS (instead of local filesystem), and then directly use it as path you are doing today.
I would prefer first option.

How upload file to Pentaho User Console server?

I need :
1) Let the user select a file from his local pc
2) Upload that file to the pentaho server
3) Process the file using a kettle transformation
I tried with a csv data source in Pentaho User Console (PUC) 5.0 but found no way to access it from a .ktr file uploaded to PUC repository. I also try to upload the csv file to a folder and still not able to access it from a .ktr file.
I think this requirement is valid :
Upload a csv data file and .ktr file to PUC folder. The .ktr should be able to read the uploaded csv file when it is executed from PUC
Imagine a simple user, with a csv. Will he be able to upload csv file to linux host using wincsp, filezilla or another ftp tool??
We need to give an easy upload functionality to our user, so after several researching hours (pentaho source code) without one line of Pentaho documentation, I found this test:
https://github.com/pentaho/pentaho-platform/blob/master/extensions/src/test/java/org/pentaho/platform/plugin/services/importer/PlatformImporterTest.java that showed me that a mimetype list should be exist somewhere.
So after search some words in all pentaho folder wiht grep command, I found this file:
/my_apps/pentaho-server-ce-7.1.0.0-12/pentaho-server/pentaho-solutions/system/ImportHandlerMimeTypeDefinitions.xml
With some intuition, I added this xml
<ImportHandler class="org.pentaho.platform.plugin.services.importer.RepositoryFileImportFileHandler">
<MimeTypeDefinitions>
<MimeTypeDefinition mimeType="text/plain" >
<extension>csv</extension>
</MimeTypeDefinition>
</MimeTypeDefinitions>
</ImportHandler>
At the bottom of file:
<tns:ImportHandlerMimeTypeDefinitions xmlns:tns="http://www.pentaho.com/schema/" .....
<ImportHandler ../>
<ImportHandler ../>
<!-- PUT CSV CONFIG HERE -->
</tns:ImportHandlerMimeTypeDefinitions>
Finally, I restarted my pentaho-server-ce-7.1.0.0-12 server and I was able to upload my csv file with this steps :
go to http://localhost:8080/pentaho
click en browse files
select some folder
click in upload (right side)
select csv and ok
Read this csv file from ktr is pending...
I hope this helps