Adding new columns to a table in Azure Data Factory - sql

I have a CSV file in blob storage with the following format:
**Column,DataType**
Acc_ID, int
firstname, nvarchar(500)
lastname, nvarchar(500)
I am trying to read this file in data factory and loop through the column names and check the destination table if these columns already exits, if not I want to create the missing columns in the SQL table.
I know that we can use the following SQL query to create columns that do not exist.
IF NOT EXISTS (
SELECT
*
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = 'contact_info' AND COLUMN_NAME = 'acc_id')
BEGIN
ALTER TABLE contact_info
ADD acc_id int NULL
END;
But I am not sure if we can read the CSV file and pass the column names from the CSV file to the above SQL query in a data factory pipeline. Any suggestions for this please?

You can create a column if not exist using the Pre-copy script in the Copy data activity.
• Table columns before executing the pipeline.
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'contact_info'
• Source file:
ADF pipeline:
Using the lookup activity, get the list columns and datatypes by connecting the source dataset to the source file.
Output of lookup activity:
Connect the lookup output to the ForEach activity to loop all the values from the lookup.
#activity('Lookup1').output.value
Add Copy data activity inside ForEach activity and connect the source to the SQL table. Select query instead of a table in Use query properties. Write a query that does not result in any result as we are using this copy activity only to add a column to the table if not exist.
select * from dbo.contact_info where 1= 2
In the Copy data activity sink, connect the sink dataset to the SQL table, and in the Pre-copy script write your query to add a new column. Here use the current ForEach loop items (column, datatype) values instead of hardcoding the values as below.
#{concat('IF NOT EXISTS ( SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = ','''','contact_info','''',' AND COLUMN_NAME = ','''',item().Column,'''',') ALTER TABLE contact_info ADD ',item().Column,' ', item().DataType,' NULL')}
When the pipeline is executed, the FoEach loop executes till it completes all the values in the lookup output and creates a new column in the table if not exist.
Columns in the table after the pipeline is executed:

Related

How to generate a script to create all tables with different schema

I have about 200 tables in a schema.
I need to replicate these tables in a new backup schema with an automatic procedure.
I would like to create a procedure to dynamically recreate all the Tables in a Schema (potentially dynamic number of tables and columns) on a different schema.
I can cycle all the tables and create the SELECT * INTO dbo_b.TABLE FROM dbo.TABLE statement, but I get the error:
Column 'AMBIENTE' has a data type that cannot participate in a columnstore index.
I created a view that simply SELECT * FROM TABLE, and tried to perform the SELECT * INTO dbo_b.TABLE from dbo.VIEW but I got the same issue.
It works only if I create the dbo_b.Table and INSERT INTO it: so I would need to generate a script to automatically cycle all the tables in my schema and generate a script to create the tables in the new schema.
It's not a one time job, it should run every day so I cannot do it manually.
Seams we get the same issue.
You can try to loop on all table and create table in the new schema in this way:
IF EXISTS(SELECT * FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'YYYY' AND TABLE_SCHEMA = 'XXXX')
drop table [ZZZZ].[YYYY]
CREATE TABLE [ZZZZ].[YYYY]
WITH ( DISTRIBUTION = ROUND_ROBIN
, HEAP ) as
( SELECT * FROM XXXX.YYYY )
Let me know. BR

DROP TABLE by CONCAT table name with VALUE from another SELECT [SQLite]

I was wondering how can I drop table with concat by selecting value from other table.
This is what I am trying to figure out:
DROP TABLE SELECT 'table' || (select value from IncrementTable)
So basically table name is table6 for example.
Goal is: eg.. DROP TABLE table6
You can't do this directly. Table and column names have to be known when the statement is byte-compiled; they can't be generated at runtime. You have to figure out the table name and generate the appropriate statement string in the program using the database, and execute it.

Hive - Create Table statement with 'select query' and 'fields terminated by' commands

I want to create a table in Hive using a select statement which takes a subset of a data from another table. I used the following query to do so :
create table sample_db.out_table as
select * from sample_db.in_table where country = 'Canada';
When I looked into the HDFS location of this table, there are no field separators.
But I need to create a table with filtered data from another table along with a field separator. For example I am trying to do something like :
create table sample_db.out_table as
select * from sample_db.in_table where country = 'Canada'
ROW FORMAT SERDE
FIELDS TERMINATED BY '|';
This is not working though. I know the alternate way is to create a table structure with field names and the "FIELDS TERMINATED BY '|'" command and then load the data.
But is there any other way to combine the two into a single query that enables me to create a table with filtered data from another table and also with a field separator ?
Put row format delimited .. in front of AS select
do it like this
Change the query to yours
hive> CREATE TABLE ttt row format delimited fields terminated by '|' AS select *,count(1) from t1 group by id ,name ;
Query ID = root_20180702153737_37802c0e-525a-4b00-b8ec-9fac4a6d895b
here is the result
[root#hadoop1 ~]# hadoop fs -cat /user/hive/warehouse/ttt/**
2|\N|1
3|\N|1
4|\N|1
As you can see in the documentation, when using the CTAS (Create Table As Select) statement, the ROW FORMAT statement (in fact, all the settings related to the new table) goes before the SELECT statement.

How to use 'COPY FROM VERTICA' on same database to copy data from one table to another

I want to copy data from one table to another in vertica using COPY FROM VERTICA command. I have a table having large data in it and i want to select few data (where field1 = 'some val' etc) from it and copy to another table.
Source table has columns of type long varchar and i want to copy these value in another table having different column type like varchar, date and boolean etc. What i want is that only valid values should be copied in destination table, error data should be rejected.
I tried to move data using insert command like below, but problem is that if even there is a single row with invalid data then it 'll terminate process (i have nothing copied in destination table).
INSERT INTO cb.destTable(field1, field2, field3)
Select cast(field1 as varchar), cast(field2 as varchar), cast(field3 as int)
FROM sourceTable Where Id = 2;
How this can be done?
COPY FROM VERTICA and EXPORT TO VERTICA are intended to copy data between clusters. Even if you did loopback the connection, you would not be able to use rejects as they are not supported by COPY FROM VERTICA. The mappings are strict, so if it cannot coerce it will fail.
You'll have to:
INSERT ... SELECT ... WHERE <conditions to filter out data that won't coerce>
INSERT ... SELECT <expressions that massage data that won't coerce>
Export data to a file using vsql (you can turn off headers/footers, turn off padding, set the delimiter to something that doesn't exist in your data, etc) Then use a copy to load it back in.
Try exporting it into a csv file:
=>/o output.csv
=>Select cast(field1 as varchar), cast(field2 as varchar), cast(field3 as int) FROM sourceTable Where Id = 2;
=>/o
Then use COPY command to load it back into the desired table.
COPY FROM '(csv_directory)' DELIMITER '(comma or your configured delimiter)' NO ESCAPE NULL '(NULL indicator)' SKIP 1;
Are they both in the same Vertica database? If so an alternative is:
DROP TABLE IF EXISTS cb.destTable;
CREATE TABLE cb.destTable AS
SELECT field1::VARCHAR, field2::VARCHAR, field3::VARCHAR
FROM sourceTable WHERE Id = 2;

load multiple table in the same database from sql to qlikview

I am trying to load all the tables in "ABCD_BKP" which starts with TEST_
The tables in my database are as follows:
ABCD_BKP
TEST_1
TEST_2
TEST_3
And I am trying to load it as per below but it does not seem to work.
SELECT *
FROM "ABCD_BKP".dbo.TEST_*
To load all tables you need to have list with the tables first and loop through this list and load the tables one by one.
For example if you are using MSSQL your script will be:
// Get all tables in "ABCD_BKP"
TableNames:
SQL
SELECT
TABLE_NAME
FROM
"ABCD_BKP".dbo.Tables
;
// Filter only table names that are starting with "TEST_"
Test_TableNames:
LOAD DISTINCT
TABLE_NAME
RESIDENT
TableNames as TestTables
WHERE
LEFT(TABLE_NAME, 5) = 'TEST_'
;
DROP TABLE TableNames; // the table with all table names is no longer needed
FOR i = 1 TO FieldValue('TestTables') // loop through all "TEST_*" tables
LET vTableName = FieldValue( 'TestTables', $(i) ); // current iteration table name
$(vTableName): //give our QV table the same name as the SQL table
SQL
SELECT
*
FROM
"ABCD_BKP".dbo.$(vTableName) // load the sql table in QV
;
NEXT
DROP TABLE Test_TableNames; // drop the QV table that contains the list with the "TEST_" tables
The sql to get the list with the tables in database is different for each database