blob contain Hive Partition table data partition created on Year, month and day.
Container look like Year=2016/ Months=1/Day-1 - 0000_1(File) to Day-31 - 0000_31(File)
Like this we have 3 months inside year and each month contain days folder and ear day folder contain a file.
Now we want o put that data into a azure sql Db table which is not partitioned.
If I understand it right , you have blobs with the structure
2016/03/01_001
2016/03/01_003
and the intend is to copy the data to SQL Azure . I am assuming that the blob structure is the same on all the files .
I suggest
1:Use the GetMetadata activity and get all the blob info
2:Use a foreach activity to read one blob at a time
3:Inside the foreach add a copy activity source being blob and sink SQL Azure .
Related
I have a copy activity that takes a bunch of JSON files and merges them into a singe JSON.
I would now like to copy the merged single JSON to Azure SQL DB. Is that possible?
Ok, it appears to be working however the output in SQL is just countryCode and CompanyId
However, I need to retrieve all the financial information in the JSON as well
Azure Data Factory Copy Activity for JSON to Table in Azure SQL DB
I repro'd the same and below are the steps.
Two json files are taken as source.
Those files are merged into single file using copy activity.
Then Merged Json data is taken as source dataset in another copy activity.
In sink, dataset for Azure SQL db is created and Auto create table option is selected.
In sink dataset, edit checkbox is selected and sink table name is given.
Once the pipeline is run, data is copied to table.
I was trying to get data from On-prem hive Source to Azure data lake gen 2 using azure data factory.
As I need to get data for multiple tables I have created and file(ex: tnames.txt) with all my table names and stored in data lake gen 2.
In Azure Data Factory created a lookup activity and passed tnames.txt file to it.
Then added a foreach activity to that lookup actvity and in foreach activity added a copy activity.
In copy activity in source, I was giving query to extract data.
Sink is datalake gen 2.
Example code:
select * from tableName
Here table is dynamically passed from tnames.txt.
But after data is copied into data lak,e I am getting headers in copied data are like:
"tablename.columnname".
For example: Table name is Employee and few columns are ID, Name, Gender,....
My resultent file columns are like Employee.ID,Employee.Name,Employee.Gender, but my requirement is just column name.
Basically tabe name is append to column name.
How to solve this issue/Is there any other way to get data for multiple tables in single pipeline/copy activity?
Check the mapping tab of your copy activity . If the mapping is enabled, clear it and use auto-create table . It will auto-generate the schema according to the source schema. No need to explicitly create the table with defined schema. Let it be auto create table. It will generate required mapping automatically.
I have my data available in smartsheet location . Month folders and each month folder i have weekly sheets(example smartsheet/October/wk1,wk2,wk3,wk4 files) and i want to load the data dynamically to hive table.Can someone suggest me how to load dynamically.
Environment:
MS Azure:
Blob Container, multiple csv files saved in a folder. This is my source.
Azure Sql Database. This is my target
Goal:
Use Azure Data Factory and build a pipeline to "copy" all files from the container and store them in their respective tables in the Azure Sql database by automatically creating those tables.
How do I do that? I tried following this but I just end up having tables incorrectly created in the database, where table is created with a single column having same name as the table name.
I believe I followed the instructions from that link pretty must as they are.
My CSV file is as follows, one column contains the table name.
The previous steps will not be repeated,it is the same as the link.
At Step3 inside the Foreach activity, we should add a Lookup activity to query the table name from the source dataset.
We can declare a String type variable tableName pervious, then set the value via expression #activity('Lookup1').output.firstRow.tableName.
At sink setting of the Copy activity, we can key in #variables('tableName').
ADF will auto create the table for us.
The debug result is as follows:
I have a folder of CSV files separated by date in Google Cloud Storage. How can I upload it directly to BigQuery as a partitioned table?
You can do the following:
Create partitioned table (for example: T)
Run multiple load jobs to load each day's data into the corresponding partition. So for example, you can load data for May 15th, 2016 by specifying the destination table of load as 'T$20160515'
https://cloud.google.com/bigquery/docs/creating-partitioned-tables#restating_data_in_a_partition