I am trying to do a basic dataflow task in SSIS2012 where the destination is a Netezza table. I have a column that I named "Transaction_ID" that I am planning to use as a primary key and I have a sequence ready to populate it (Seq_Transaction_ID). However, I am not sure how exactly to assign the sequence.
For instance I tried making the table and assigning the column to a default of the next value for the sequence and got an error "you can only use a next value function within a target list". I'm also not sure how I would get the sequence into SSIS either.
Any ideas?
Thank you,
Related
I've multiple csv files and multiple tables.
The table name is file name and column name is first row of csv file.
Now I want to add default value of empty string to the sink table.
Consider my scenario,
employee:
id int, name varchar, is_active bit NULL
employee.csv:
id|name|is_active
1|raja|
Now I'm trying to copy the csv data to PostgreSQL table its throwing error.
Expected result is default value if its empty value.
You can use NULLIF in PostgreSQL:
NULLIF(argument_1,argument_2);
The NULLIF function returns a null value if argument_1 equals to argument_2, otherwise it returns argument_1.
This way you can replace NULL value with some other value
If your error is related to Type mismatch then consider typecasting the column first
Thanks!
As per the issue, tried to repro the scenario and here is the following outcome which was successfully copied. You have to use
Source Dataset: employee.csv from Azure Blob Storage
Sink Dataset : Here, I have used the sink as Azure SQL DB for some limitations but as you have used PostgreSQL is almost similar.
Copy Activity Settings:
Under the mapping settings there will be type conversion, where you have to import schema else you can dynamically add
Output:
Alternative to use DataFlow - if you have multiple data fields, you need to use the derived column transformation to generate new columns in your data flow or to modify existing fields.
For more details, refer Derived column transformation in mapping data flow.
You can even refer to this Microsoft Q&A post for more insights: Copy Task failure because of conversion failure
I'm working on an SSIS project that pulls data form Excel and loads to Oracle Database every month. I plan to pull data from Excel file and load to Oracle stage table. I will be using a merge statement because the data that gets loaded each month is a rolling 12 month list and the data can change, so need to be able to INSERT when records don't match or UPDATE when they do. My control flow looks like this: Truncate Stage Table (to clear out table from last package run)---> DATA FLOW from Excel to Stage Table---> Merge to Target Table in Oracle.
My problem is that the data in the source Excel file doesn't have any unique columns to select a primary key or a composite key, as it is a possibility (although very unlikely) that a new record could have the exact same information. I am unable to utilize the "generated always as identity" because my SSIS package needs to truncate at the beginning of each job to clear out the Stage Table. This would generate the same ID numbers in the new load and create problems in the Target Table.
Any suggestions as to how I can get around this problem?
Welcome to SO and ETL. Instead of using a staging table, in SSIS use two sources: Excel file and existing production table. Sort both inputs and then perform a merge join on the unique identifier. From there, use a derived column transformation to add a new column called 'Action' which will mark a row as either an INSERT/UPDATE/DELETE based on whether the join key is NULL. So:
NULL from file means DELETE (not in file, in database)
NULL from database means INSERT (in file, not in database)
Not NULL for both means UPDATE (in file, in database)
From there, use a conditional split to split rows to either a OLE DB Destination (INSERT), or SQL Command (UPDATE or DELETE). You can now remove the stage environment and MERGE command from your process. This has the added benefit of removing the ETL load from the SQL Server, assuming SSIS is running on a separate server.
Note: The sort transformation has the option to remove duplicates.
I have 4 columns in an excel file. I need to assign values from each row to the corresponding 4 variables so they can be replaced lated in a query i am doing on a server.
My question is: How to do that?
So far i tried doing an SQL task in which i create a table with 4 columns (having the same names as those in my excel file) and a task to transfer the content of the excel to a recordset destination which stores the results into a variable. I also created a foreach loop in which i am having my tasks. What am I missing, how can I do this?
Thanks
EDIT
please find below a screenshot from my project. This is the overview.
In "Execute SQL Task" there is a connection to excel and has the following statement
CREATE TABLE tempVariableMapping
(
AsofDate varchar(20),
Assump_Set varchar(20),
MarketName varchar(20),
Portname varchar(20)
);
Then in the transfer task (in the recordset destination), i'm assigning the variable name to User::RecordSetOutput which is a global variable of type object.
In foreach loop i'm using a foreach ado enumerator and pointing to that User::RecordSetOutput variable find below the variable mapping
Those 4 variables in variable mapping are those in which i want to pass the values from each row of the excel file.
The sequence container and create temp table are just dummy. Haven't figured out the correct way. Everything below that, works.
Sorry for the missunderstanding, hope this is enough to get the picture.
Thank you for your time and help
I'm trying to do a direct load to a PostgreSQL table in Rails. My connection is established like this:
rc = connection.raw_connection
rc.exec("COPY research FROM STDIN WITH CSV HEADER")
Here is a portion of the code:
for key,values in valuehash
entry = standard.model_constant::ATTRIBUTE_HASH[key.to_sym]
puts "Loading #{entry[:display]}..."
values.each do |value|
id = ActiveRecord::Base.connection.execute("select nextval('research_id_seq'::regclass)").first['nextval']
line = [id, value, entry[:display], standard.id, now, now]
until rc.put_copy_data( line.to_csv )
ErrorPrinter.print " waiting for connection to be writable..."
sleep 0.1
end
end
end
The line that begins id = fails. It says:
ActiveRecord::StatementInvalid: PG::UnableToSend: another command is already in progress
I don't want to write all these values to a CSV file, then read in that file. I just want to send the data from memory right to the database. I can't think of a valid reason why I couldn't run two connections to the same database against the same table, so I figured I must be doing something wrong. I've tried two raw connections, but it always keeps giving me the same connection from the pool.
The best way to handle this is to have the database insert the sequence numbers for you, rather than you trying to fetch and add them.
Assuming that your table is defined something like:
CREATE TABLE research (
id SERIAL,
value TEXT,
...
)
then the table is defined such that if you insert a row, and you do not specify a value for id, then it will default to a value from the sequence automatically. This applies equally to COPY as any other form of inserting rows.
The trick is to make sure you are not supplying a value. To do this, structure your COPY command so that you do not include the id column:
COPY research(value,....) FROM STDIN WITH CSV HEADER
ie. you have to specify the list of fields you will supply data for, and the order in which you will supply it.
Note that even if you have a header, it is not used on import to determine the columns present or their order. The header is just discarded.
Assuming you changed the COPY statement, then you would just drop the id from the array of values you are building to send to the database.
I have a series of task that are very similar:
SELECT a,b FROM c
Lookup in another table and change value in column b.
Save new value back to c and if not match, send the result on to an error table.
That part is pretty straight forward and illustrated here:
Source ==> Lookup =match=> SQL Update command
=No match=> SQL Save Error command
(Hope you understand what I mean - but it works!)
I now have to repeat this a number of times, where my source-sql changes. So what I want to do is to insert a Script Component in front of the Source and set my User::Sql variable like:
Variables.Sql = "SELECT d, e FROM f"
All of the above is contained in a Data Flow. When I have created one I can then copy that one and only change the Sql variable in the script and then it should all work.
My problem is: When I insert the Script Command it asks me if it is a Source, Destination or Transscript script. And by only setting the variable it does not produce any rows for output and cannot connect to my Source.
Anyone know how to make that work?
(I have simplified the above. I actually want to update multiple variables and use those in my Source, Lookup and Error update as well - therefore it is not more simple just to change the SQL script in the initial Source! But being able to do the above, I will be able to achieve what I want :-))
You should set your variable containing the SQL query in the control flow, before you execute the dataflow.
Then you need to use that variable as an expression in your Dataflow. You can parametrize the query used in the lookup or any other parameters of your dataflow.
If your dataflows really have always the same structure, you could even generate a list of queries and call your dataflow task in a loop, preventing the duplication of the same tasks.