How do I reference the table name (highlighted cursor) from a csv file in DBT Model (GitHub)?
My current yml file only has "models: ...".
The csv file to be referenced is named orders.csv, uploaded under tables -> datawarehouse folder.
I think you're referring to a seed, which is a feature where dbt can create a table in your warehouse using a .csv file that is stored alongside the code in your project.
After you add the .csv file to your seeds directory inside your project (or some other directory nested under /seeds/), you run dbt seed in your terminal to create the table from the data in the CSV. From your example, let's say the CSV is called orders.csv and is located at /seeds/tables/datawarehouse/orders.csv.
After that, you can select from the seed in other models by using ref with the seed's filename, so {{ ref('orders') }}.
If you are using another tool (not dbt seed) to upload the CSV, you need to find the location of the table in your data warehouse, and then add that location as a source, and you will specify the database/schema/table name in the sources.yml file. If you have the table defined as a source, you select from it with {{ source('my_source', 'my_table') }}.
Related
I have created an external data source and CSV file format.I am creating an external table susing cetas script
create external table test
With (location='test/data/',
Data_source=test_datasource,
File_format=csv_format)
As select * from dimcustomer
But when I run this is query this is generating many files with extension .text.deflated .
Can we generate only one file and can we give the name to the file which we generate.
Input appreciated .I am creating this external table to export synapse data to data lake container.
Tried creating an external table
Environment:
MS Azure:
Blob Container, multiple csv files saved in a folder. This is my source.
Azure Sql Database. This is my target
Goal:
Use Azure Data Factory and build a pipeline to "copy" all files from the container and store them in their respective tables in the Azure Sql database by automatically creating those tables.
How do I do that? I tried following this but I just end up having tables incorrectly created in the database, where table is created with a single column having same name as the table name.
I believe I followed the instructions from that link pretty must as they are.
My CSV file is as follows, one column contains the table name.
The previous steps will not be repeated,it is the same as the link.
At Step3 inside the Foreach activity, we should add a Lookup activity to query the table name from the source dataset.
We can declare a String type variable tableName pervious, then set the value via expression #activity('Lookup1').output.firstRow.tableName.
At sink setting of the Copy activity, we can key in #variables('tableName').
ADF will auto create the table for us.
The debug result is as follows:
I have an hdfs folder with many csv.gz within, all with the same schema. My customer needs to read the content of these tables through Hive.
I tried to apply https://cwiki.apache.org/confluence/display/Hive/CompressedStorage . However it moves the file, whereas I need it to stay in its initial directory.
Another problem is that I should load each file one by one, I would rather create a table from the directory and not manage file individually.
I do not master Hive at all. Is his possible?
Yes, this is possible via Hive. You can create an external table and reference the existing HDFS location containing the gzip files. The schema for the data should be specified during the table creation.
hive> CREATE EXTERNAL TABLE my_data
(
column_1 int,
column_2 string
)
LOCATION 'hdfs:///my_data_folder_with_gzip_files';
When I run the create external table query, I have to provide a directory for the 'Location' attribute. But if the directory I point to has more than one file, then it reads both files. For example, if I put LOCATION 'dir1/', and dir1 contains file1 and file2, both files will be read.
To avoid this, I want to point to a single file. When I tried LOCATION 'dir1/file1', it gave me an error that the file path is not a directory or unable to create one. Is there a way to point to just the single file?
If You want to load data from HDFS so try this
LOAD DATA INPATH '/user/data/file1' INTO TABLE table1;
And if you want to load data from local storage so,
LOAD DATA LOCAL INPATH '/data/file1' INTO TABLE table1;
There is a folder named "Sample" which contains n number of csv files. How to load all the csv files into Hive table dynamically?
For normal insert we use
load data inpath "file1.csv" into table Person;
With out hardcoding, can it be done for all the files??
You just need to pass in the directory name like:
load data inpath "/directory/name/here" into table Person;
Quoting the manual :
filepath can refer to a file (in which case Hive will move the file
into the table) or it can be a directory (in which case Hive will move
all the files within that directory into the table). In either case,
filepath addresses a set of files.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML