I'm new to ODI & trying to create my first mapping between csv file and database table. Getting below error when selecting resource name in the data store.
The directory XXXX specified in your schema does not exist
I verified that the directory provided in physical schema is correct. The file is placed in local machine where ODI is also installed.
I placed my file in different paths, nothing works.
Physical schema should only read C:\xxx\yyy
Earlier I gave the full path including the file name in physical Schema (like C:\xxx\yyy\zzz.csv) which caused the error.
The datastore that you create in the relative model will be called zzz.csv
Related
I created a database with my preferred location (/user/hive/) using below query.
create database test
location "/user/hive/";
After creating the database, I checked in the location /user/hive/ for test.db directory using the command hadoop dfs -ls /user/hive. It was not available.
later I created one more database with default location using below query.
create database test2;
For database test2, I can see test2.db directory under default warehouse directory /user/hive/warehouse/
/user/hive/test.db directory got created when I explicitly specified it in LOCATION filed as below.
create database test
location "/user/hive/test.db";
As I'm new to Hive, Can any please explain.
Why test.db directory dint got created for my first query where I specified the location field as /user/hive/?
How Hive will work when location field is specified?
NOTE:
I'm using cloudera quick-start VM
Hive Version: Hive 1.1.0-cdh5.13.0
This is an expected behavior from Hive
create database test location "/user/hive/";
by executing above statement means you are creating test database on pointing to /user/hive directory so that's the reason why hive haven't created test directory.
We need to explicitly mention directory name also where we need to point the database in hive i.e create database test location "/user/hive/test.db"; then only hive creates test db pointing to test.db directory.
In case of create database test2; statement means we are creating database without specifying the location so this directory created under default hive warehouse location with same name as db name.
First of all please refer the screenshot:
What is the use of Logical name and why do we use it?
I attached the mdf file which is not in sql path. This connection succeeded.
Additionally given the logical name, now connections throws the error:
Error: Unable to open the physical path "" operation system error 32,
Cannot attach the file "" as database ""
I googled for this, but no one suggest the perfect solution.
I tried this in Administrator mode, and Windows authentication.
On a specific data folder, you cannot have the same .mdf file
But with different .mdf or .ndf file names you can create data files with same names for different databases on a SQL Server instance
Could you please check if your target database, do you have a data file with name "AttachFile" before you do the restore?
By going through the internet about external tables and managed table, I understood that we need to specify the Location while creating the external table as hive will create the tables in the given location but in case of managed table, the default directory mentioned in hive.metastore.warehouse.dir will be used.
Please correct me if anything wrongly stated.
What confusing me is:
Is the LOCATION clause used to specify where the data exist for External table or where to create the directory to store the actual data?
If the LOCATION clause is used to specify where the data exist, then why are we using the PATH clause in the LOAD statement.
The location clause in the DDL of an external table is used to
specify the hdfs location where the data needs to be stored. Later
on when we query the table the data would be read from this specified
path.
The load data inpath is the path of the source file from where the data
is loaded into the table. The source could be either a local file
path or a hdfs file path.
Hope I have cleared your confusion.
I have a directory containing multiple xlsx files and what I want to do is to insert the data from the files in to a database.
So far I have solved this by using tFileList -> tFileInputExcel -> tPostgresOutput
My problem begins when one of this files doesn't match the defined schema and returns an error resulting on a interruption of a workflow.
What I need to figure out is if it's possible skip that file (moving it to another folder for instance) and continuing iterating the rest of existing files.
If I check the option "Die on error" the process ends and doesn't process the rest of the files.
I would approach this by making your initial input schema on the tFileInputExcel be all strings.
After reading the file I would then validate the schema using a tSchemaComplianceCheck set to "Use another schema for compliance check".
You should be able to then connect a reject link from the tSchemaComplianceCheck to a tFileCopy configured to move the file to a new directory (if you want it to move it then just tick "Remove source file").
Here's a quick example:
With the following set as the other schema for the compliance check (notice how it now checks that id and age are Integers):
And then to move the file:
Your main flow from the tSchemaComplianceCheck can carry on using just strings if you are inserting into a database. You might want to use a tConvertType to change things back to the correct data types after this if you are doing any processing that requires proper data types or you are using your tPostgresOutput component to create the table as well.
I'm using the hortonworks's Hue (more like a GUI interface that connects hdfs, hive, pig together)and I want to load the data within the hdfs into my current created table.
Suppose the table's name is "test", and the file which contains the data, the path is:
/user/hdfs/test/test.txt"
But I'm unable to load the data into the table, I tried:
load data local inpath '/user/hdfs/test/test.txt' into table test
But there's error said can't find the file, there's no matching path.
I'm still so confused.
Any suggestions?
Thanks
As you said "load the data within the hdfs into my current created table".
But in you command you are using :
load data local inpath '/user/hdfs/test/test.txt' into table test
Using local keyword it looks for the file in your local filesystem. But you file is in HDFS.
I think you need to remove local keyword from you command.
Hope it helps...!!!
Since you are using the hue and the output is showing not matching path. I think you have to give the complete path.
for example:
load data local inpath '/home/cloudera/hive/Documents/info.csv' into table tablename; same as you can give the complete path where the hdfs in which the document resides.
You can use any other format file
remove local keyword as ur referring to local file system