I'm trying out Azure Data Factory v2 and I want to pipe data from an SQL source to an Oracle sink.
My problem is, that I have several Not Null columns in my Oracle tables which specify for example the date and time of which a dataset is loaded into Oracle. These columns however aren't existant in the SQL tables so when I want to start the pipeline I get the error that these columns can't be null in the Oracle sink.
My question is now, is it possible to artificially add these columns during the pipeline run so that these columns get filled by the Data Factory?
Can I use a stored procedure or a custom activity for that?
Or do I have to create a Powershell script which "hardcodes" the values I want to add to the source?
You can accomplish this in ADFv2 using a query against your source dataset in the Copy activity to insert values.
Using the table ex_employee, with the following configuration in each database:
Source table (SQL):
ID int not null,
Name nvarchar(25) not null
Sink table (Oracle):
ID number(p,0) not null,
Name nvarchar2(25) not null,
CreatedDate timestamp not null
In the Source configuration on your Copy activity in ADF, you would select the Query option under Use Query, and input a query, such as:
SELECT ID, Name, CURRENT_TIMESTAMP AS CreatedDate FROM ex_employee
This will take the existing values from your SQL table, and insert a default value into the result set, which can then be inserted into your Oracle sink.
Does this column has default value ? can you add default to this column then try? I not familiar with oracle pipe data however a similar approach in the below example adding a default value to a not null column.
drop table ex_employee
/
create table ex_employee (id number(1) null ,name varchar2(100) default 'A' not null )
/
insert into ex_employee(id)
select 1 from dual
/
commit
/
selecT * from ex_employee where id=1
Related
What would be the best way to transfer certain number of records daily from source to destination and then remove from source?
DB - SQL server on cloud.
As the databases are in the same server, you can create a job that transfers the data do the other database.
Because the databases are in the same server you can easily access them, just by adding the database before the table in the query, look the test that i did:
CREATE DATABASE [_Source]
CREATE DATABASE [_Destination]
CREATE TABLE [_Source].dbo.FromTable
(
some_data varchar(10)
)
CREATE TABLE [_Destination].dbo.ToTable
(
some_data varchar(10)
)
INSERT INTO [_Source].dbo.FromTable VALUES ('PAULO')
--THE JOB WOULD BE SOMETHING LIKE THIS:
-- INSERT INTO DESTINATION GETTING THE DATA FROM THE SOURCE
INSERT INTO [_Destination].dbo.ToTable
SELECT some_data
FROM [_Source].dbo.FromTable
-- DELETE FROM SOURCE
DELETE [_Source].dbo.FromTable
I have an SSIS Package that runs a query and inserts values into a different table. Each time the package runs, I want to create a unique RunID for the results of that run. Here are the columns from my table. I have tried this using the Execute SQL Task and setting up the User::RunID variable but, I believe I am doing something wrong. Can anyone provide step by step instructions on how to do this?
You need 2 tables for this.
create table runs(
runID int identity primary key,
runDateTime datetime default getdate()
)
create table runReturns(
runReturnsID int identity primary key,
runID int not null,
[the rest of the data set]
)
In ssis, start with an execute SQL.
Add this query...
insert into runs (runDateTime) values(?);
select SCOPE_IDENTITY()
Map the parameter (?) to Now();
Change the result set to single row and map the first column to a parameter called runID.
Now create a data flow.
Insert your query into a sql source.
Add a derived column and map a new column to runID.
Finally, add a destination to your table and map accordingly.
Adding a completely sql answer to compliment as an alternative since there are no transformations at all:
Same 2 tables:
create table runs(
runID int identity primary key,
runDateTime datetime default getdate()
)
create table runReturns(
runReturnsID int identity primary key,
runID int not null,
[the rest of the data set]
)
Create a Job.
Add a step and base it on SQL.
declare #runID int;
insert into runs(runDateTime) values(getdate());
select #runID = scope_idenity();
insert into runReturns(
runID, [rest of your columns])
select #runID
, [rest of your columns]
from [rest of your query]
An approach that might solve the issue, is the system scoped variable ServerExecutionID By default, System scoped variables are hidden in the Variables menu but you can expose them by clicking the Grid options button (rightmost of the 5).
If you reference that variable using the appropriate placeholder (? for OLE/ODBC or a named parameter for ADO) and map to the variable, then every server execution will have a monotonically increasing number associated to it. Runs from Visual Studio or outside of the SSISDB, will always have a value of 0 associated to them but given that this is only encountered during development, this might address the issue.
Sample query based on the newer picture
INSERT INTO dbo.RunMapTable
SELECT ? AS RunID
, D.Name
FROM
(
VALUES ('A')
, ('B')
, ('C')
, ('D')
)D([name];
Parameter Mapping
0 -> System::ServerExecutionID
As an added bonus, you can then tie your custom logging back to the native logging in the SSISDB.
I have a table in RDBMS like so:
create table test (sno number, entry_date date default sysdate).
Now I want to create a table in hive with a structure as adding a default value to a column.
Hive currently doesn't support the feature of adding default value to any column while creating a table.
As a workaround load data into a temporary table and use the insert overwrite table statement to add the current date and time into the main table.
Create a temporary table:
create table test (sno number);
Load data into the table:
Create final table:
create table final_table (sno number, createDate string);
Finally load the data from temp test table to the final table:
insert overwrite table final_table select sno, FROM_UNIXTIME( UNIX_TIMESTAMP(), 'dd/MM/YYYY' ) from test;
Hive doesn't support DEFAULT fields
Doesn't mean you can't do it, though. Just a two step process of creating one "staging" table, then inserting into a second table and selecting that "default" value.
Adding a default value to a column while creating table in hive
Since you mention,
I've table in RDBMS
You could also use your existing table, and use Sqoop to import the data into Hive.
I am trying to initialise my hsqldb with some default data but seem to be having a problem with identity and timestamp columns.
I just realised that I probably wasn't clear what I meant when I said "script". I am meaning the command line argument that you pass to hsqldb to generate your database at startup. I can successfully run the query inside DbVisualiser or some other database management tool.
I have a table with the following definition:
create table TableBob (
ID int NOT NULL identity ,
FieldA varchar(10) NULL,
FieldB varchar(50) NOT NULL,
INITIAL_DT timestamp DEFAULT CURRENT_TIMESTAMP NOT NULL);
I can successfully create this table using the script but trying to insert a record doesn't work. Below is what I would consider valid sql for the insert since the ID and INITIAL_DT fields are Identity and Default columns). Strangely it inserts null into every column even though they are defined as NOT NULL....
e.g.
INSERT INTO TableBob (FieldA, FieldB) VALUES ('testFieldA', 'testFieldB');
Thanks for your help
Please try with HSQLDB's DatabaseManagerSwing (you can double click on the hsqldb.jar to start the database manager). First execute the CREATE TABLE statement, then the INSERT statement, finally the SELECT statement.
It should show the correct results.
If you want to use a script to insert data, use the SqlTool.jar which is available in the HSQLDB distribution zip package. See the guide: http://hsqldb.org/doc/2.0/util-guide/
I know when you insert a value into db, it set that column value as current datetime,
does it apply to it when you run a update statement?
e.g.
table schema:
Id, Name, CreatedDate(getdate())
when i insert into table id = 1 , name = 'john' it will set createdDate = current date
if i run an update statement
update table set name="john2" where id =1
Will it update the createdDate?
No, a DEFAULT CONSTRAINT is only invoked on INSERT, and only when (a) combined with a NOT NULL constraint or (b) using DEFAULT VALUES. For an UPDATE, SQL Server is not going to look at your DEFAULT CONSTRAINT at all. Currently you need a trigger ( see How do I add a "last updated" column in a SQL Server 2008 R2 table? ), but there have been multiple requests for this functionality to be built in.
I've blogged about a way to trick SQL Server into doing this using temporal tables:
Maintaining LastModified Without Triggers
But this is full of caveats and limitations and was really only making light of multiple other similar posts:
A System-Maintained LastModifiedDate Column
Tracking Row Changes With Temporal
Columns
How to add “created” and “updated” timestamps without triggers
Need a datetime column that automatically updates
wow - hard to understand...
i think NO based on the clues.
if you insert a record with a NULL in a column, and that column has a default value defined, then the default value will be stored instead of null.
update will only update the columns specified in the statement.
UNLESS you have a trigger that does the special logic - in which case, you need to look at the trigger code to know the answer.
if your update statement tell to update a column with getfate() it will, but if you just update a name for example and you have a createdate column (which was inserted with getdate()), this columns wont be affected.
You can achieve this using DEFAULT constraint like i did it with OrderDate field in below statement.
CREATE TABLE Orders
(
O_Id int NOT NULL,
OrderNo int NOT NULL,
P_Id int,
OrderDate date DEFAULT GETDATE()
)