SSIS (in SQL Server 2012): Upsert in Lookup component - sql

I have > 10 packages that need to update/insert in the dataflow. I am able to do it by:
Lookup => Match output branch => OLE DB Command.
Lookup => No Match output branch => OLE DB Destination.
(http://www.rad.pasfu.com/index.php?/archives/46-SSIS-Upsert-With-Lookup-Transform.html)
(http://jahaines.blogspot.com/2009/09/sss-performing-upsert.html)
However, I was wondering if there is some way I can use the "Merge" statement in Lookup (or in any other) component such that I can do something like:
MERGE [DBPrac].[dbo].[TargetTable] AS tt
USING [SourceTable] AS st ON tt.Id = st.Id
WHEN MATCHED THEN --* Update the records, if record found based on Id.
UPDATE
SET tt.SSN = st.SSN
,tt.FirstName = st.FirstName
,tt.MiddleName = st.MiddleName
,tt.LastName = st.LastName
,tt.Gender = st.Gender
,tt.DateOfBirth = st.DateOfBirth
,tt.Email = st.Email
,tt.Phone = st.Phone
,tt.Comment = st.Comment
WHEN NOT MATCHED BY TARGET THEN --* Insert from source to target.
INSERT (Id, SSN, FirstName, MiddleName, LastName, Gender, DateOfBirth, Email, Phone, Comment)
VALUES (st.Id, st.SSN, st.FirstName, st.MiddleName, st.LastName, st.Gender, st.DateOfBirth, st.Email, st.Phone, st.Comment)
;
SELECT ##ROWCOUNT;
SET IDENTITY_INSERT [dbo].[TargetTable] OFF
GO
So far I tried:
In Lookup component's "Advanced" pane in "Custom query", I tried to use the above query, but stumbled upon the "SourceTable". Don't know how to get hold of input recordset in the "Custom query"
(Don't know if it is even possible).
Any help and/or pointer would be great.

Yes you can use MERGE but you need to load your data into a staging table. This is the 'ELT' method - extract, load (into database), transform, as opposed to the 'ETL' method - extract, transform (in package), load (into database)
I usually find the ELT method faster and more maintainable, if you don't mind working with SQL scripts. Certainly a single bulk update is faster than the row by row update that occurs in SSIS

If I understand your question correctly, just execute the Merge statement using an Execute SQL task. Then you dont need any Lookups. We use the same strategy for our warehouse's final load from staging.

Related

SQL update multiple rows with different values where they match a value from a list

So perhaps the title is a little confusing. If you can suggest better wording for that please let me know and i'll update.
Here's the issue. I've got a table with many thousands of rows and i need to update a few thousand of those many to store latest email data.
For example:
OldEmail#1.com => NewEmail#1.com
OldEmail#2.com => NewEmail#2.com
I've got a list of old emails ('OldEmail#1.com','OldEmail#2.com') and a list of the new ('NewEmail#1.com','NewEmail#2.com'). The HOPE was was to sort of do it simply with something like
UPDATE Table
SET Email = ('NewEmail#1.com','NewEmail#2.com')
WHERE Email = ('OldEmail#1.com','OldEmail#2.com')
I hope that makes sense. Any questions just ask. Thanks!
You could use a case expression:
update mytable
set email = case email
when 'OldEmail#1.com' then 'NewEmail#1.com'
when 'OldEmail#2.com' then 'NewEmail#2.com'
end
where email in ('OldEmail#1.com','OldEmail#2.com')
Or better yet, if you have a large list of values, you might create a table to store them (like myref(old_email, new_email)) and join it in your update query, like so:
update t
set t.email = r.new_email
from mytable t
inner join myref r on r.old_email = t.email
The actual syntax for update/join does vary accross databases - the above SQL Server syntax.
With accuracy to the syntax in particular DBMS:
WITH cte AS (SELECT 'NewEmail#1.com' newvalue, 'OldEmail#1.com' oldvalue
UNION ALL
SELECT 'NewEmail#2.com', 'OldEmail#2.com')
UPDATE table
SET table.email = cte.newvalue
FROM cte
WHERE table.email = cte.oldvalue
or, if CTE is not available,
UPDATE table
SET table.email = cte.newvalue
FROM (SELECT 'NewEmail#1.com' newvalue, 'OldEmail#1.com' oldvalue
UNION ALL
SELECT 'NewEmail#2.com', 'OldEmail#2.com') cte
WHERE table.email = cte.oldvalue
Consider prepared statement for rows update in large batches.
Basically it works as following :
database complies a query pattern you provide the first time, keep the compiled result for current connection (depends on implementation).
then you updates all the rows, by sending shortened label of the prepared function with different parameters in SQL syntax, instead of sending entire UPDATE statement several times for several updates
the database parse the shortened label of the prepared function , which is linked to the pre-compiled result, then perform the updates.
next time when you perform row updates, the database may still use the pre-compiled result and quickly complete the operations (so the first step above can be skipped).
Here is PostgreSQL example of prepare statement, many of SQL databases (e.g. MariaDB,MySQL, Oracle) also support it.

Query a database based on result of query from another database

I am using SSIS in VS 2013.
I need to get a list of IDs from 1 database, and with that list of IDs, I want to query another database, ie SELECT ... from MySecondDB WHERE ID IN ({list of IDs from MyFirstDB}).
There is 3 Methods to achieve this:
1st method - Using Lookup Transformation
First you have to add a Lookup Transformation like #TheEsisia answered but there are more requirements:
In the Lookup you Have to write the query that contains the ID list (ex: SELECT ID From MyFirstDB WHERE ...)
At least you have to select one column from the lookup table
These will not filter rows , but this will add values from the second table
To filter rows WHERE ID IN ({list of IDs from MyFirstDB}) you have to do some work in the look up error output Error case there are 2 ways:
set Error handling to Ignore Row so the added columns (from lookup) values will be null , so you have to add a Conditional split that filter rows having values equal NULL.
Assuming that you have chosen col1 as lookup column so you have to use a similar expression
ISNULL([col1]) == False
Or you can set Error handling to Redirect Row, so all rows will be sent to the error output row, which may not be used, so data will be filtered
The disadvantage of this method is that all data is loaded and filtered during execution.
Also if working on network filtering is done on local machine (2nd method on server) after all data is loaded is memory.
2nd method - Using Script Task
To avoid loading all data, you can do a workaround, You can achieve this using a Script Task: (answer writen in VB.NET)
Assuming that the connection manager name is TestAdo and "Select [ID] FROM dbo.MyTable" is the query to get the list of id's , and User::MyVariableList is the variable you want to store the list of id's
Note: This code will read the connection from the connection manager
Public Sub Main()
Dim lst As New Collections.Generic.List(Of String)
Dim myADONETConnection As SqlClient.SqlConnection
myADONETConnection = _
DirectCast(Dts.Connections("TestAdo").AcquireConnection(Dts.Transaction), _
SqlClient.SqlConnection)
If myADONETConnection.State = ConnectionState.Closed Then
myADONETConnection.Open()
End If
Dim myADONETCommand As New SqlClient.SqlCommand("Select [ID] FROM dbo.MyTable", myADONETConnection)
Dim dr As SqlClient.SqlDataReader
dr = myADONETCommand.ExecuteReader
While dr.Read
lst.Add(dr(0).ToString)
End While
Dts.Variables.Item("User::MyVariableList").Value = "SELECT ... FROM ... WHERE ID IN(" & String.Join(",", lst) & ")"
Dts.TaskResult = ScriptResults.Success
End Sub
And the User::MyVariableList should be used as source (Sql command in a variable)
3rd method - Using Execute Sql Task
Similar to the second method but this will build the IN clause using an Execute SQL Task then using the whole query as OLEDB Source,
Just add an Execute SQL Task before the DataFlow Task
Set ResultSet property to single
Select User::MyVariableList as Result Set
Use the following SQL command
DECLARE #str AS VARCHAR(4000)
SET #str = ''
SELECT #str = #str + CAST([ID] AS VARCHAR(255)) + ','
FROM dbo.MyTable
SET #str = 'SELECT * FROM MySecondDB WHERE ID IN (' + SUBSTRING(#str,1,LEN(#str) - 1) + ')'
SELECT #str
If the column has string data type you should add quotation before and after values as below:
SELECT #str = #str + '''' + CAST([ID] AS VARCHAR(255)) + ''','
FROM dbo.MyTable
Make sure that you have set the DataFlow Task Delay Validation property to True
This is a classic case for using LookUp Transformation. First, use a OLE DB Source to get data from the first database. Then, use a LookUp Transformation to filter this data-set based on the ID values from the second data-set. Here is the steps for using a LookUp Transformation:
In the General tab, select Full Cash, OLE DB Connection Manager and Redirect rows to no match output as shown in the following picture. Notice that using Full Cash provides great performance for your package.
General Setting
In the Connection tab, use OLE DB Connection Manager to connect to your second server. Then, you can either directly select the data-set with ID values or (as is shown in the picture below) you can use SQL code to select the IDs from the filtering data-set.
Connection:
Go to Columns tab and select ID columns from the both datasets. For each record from your first data-set, it will check to see if its ID is in the Available LookUp Column. If it is, it will go to the Matching output, else to No Matching output.
Match ID columns:
Click on OK to close the LookUp. Then you need to select the LookUp Match Output.
Match Output:
The "best" answer depends on data volumes and source systems involved.
Many of the other answers propose building out a list of values based on clever concatenation within SQL Server. That doesn't work so well if the referenced system is Oracle, MySQL, DB2, Informix, PostGres, etc. There may be an equivalent concept but there might not be.
For best performance, you need to filter against the second db before any of those rows ever hit the data flow. That means adding a filtering condition, as the others have suggested, to your source query. The challenge with this approach is that your query is going to be limited by some practical bounds that I don't remember. Ten, one hundred, a thousand values in your where clause is probably fine. A lakh, a million - probably not so much.
In the cases where you have large volumes of values to filter against the source table, it can make sense to create a table on that server and truncate and reload that table (execute sql task + data flow). This allows you to have all of the data local and then you can index the filter table and let the database engine do what it's really good at.
But, you say the source database is some custom solution that you can't make tables in. You can look at the above approach with temporary tables and within SSIS you just need to mark the connection as singleton/persisted (TODO: look this up). I don't much care for temporary tables with SSIS as debugging them is a nightmare I'd not wish upon my mortal enemy.
If you're still reading, we've identified why filtering in the source system might not be "doable", even if it will provide the best performance.
Now we're stuck with purely SSIS solutions. To get the best performance, do not select the table name in the drop down - unless you absolutely need every column. Also, pay attention to your data types. Pulling LOB (XML, text, image (n)varchar(max), varbinary(max)) into the dataflow is a recipe for bad performance.
The default suggestion is to use a Lookup Component to filter the data within the data flow. As long as your source system supports and OLE DB provider (or you can coerce the data into a Cache Connection Manager)
If you can't use a Lookup component for some reason, then you can explicitly sort your data in your source systems, mark your source components as such, and then use a Merge Join of type Inner Join in the data flow to only bring in matched data.
However, be aware that sorts in source systems are going to be sorted according to native rules. I ran into a situation where SQL Server was sorting based on the default ASCII sort and my DB2 instance, running on zOS, provided an EBCDIC sort. Which was great when my domain was only integers but went to hell in a handbasket when the keys became alphanumeric (AAA, A2B, and AZZ will sort differently based on this).
Finally, excluding the final paragraph, the above assumes you have integers. If you're performing string matching, you get an extra level of ugliness because different components may or may not perform a case sensitive match (sorting with case sensitive systems can also be a factor).
I would first create a String variable e.g. SQL_Select, at the Scope of the Package. Then I would assign that a value using an Execute SQL Task against the 1st database. The ResultSet property on the General page should be set to Single row. Add an entry to the Result Set tab to assign it to your Variable.
The SQL Statement used needs to be designed to return the required SELECT statement for your 2nd database, in a single row of text. An example is shown below:
SELECT
'SELECT * from MySecondDB WHERE ID IN ( '
+ STUFF ( (
SELECT TOP 5
' , ''' + [name] + ''''
FROM dbo.spt_values
FOR XML PATH(''), TYPE).value('(./text())[1]', 'VARCHAR(4000)'
) , 1 , 3, '' )
+ ' ) '
AS SQL_Select
Remove the TOP 5 and replace [name] and dbo.spt_values with your column and table names.
Then you can use the variable SQL_Select in a downstream task e.g. an OLE DB Source against database 2. OLE DB Sources and OLE DB Command Tasks both let you specify a Variable as the SQL Statement source.
You could add a LinkedServer between the two servers. The SQL command would be something like this:
EXEC sp_addlinkedserver #server='SRV' --or any name you want
EXEC sp_addlinkedsrvlogin 'SRV', 'false', null, 'username', 'password'
SELECT * FROM SRV.CatalogNameInSecondDB.dbo.SecondDBTableName s
INNER JOIN FirstDBTableName f on s.ID = f.ID
WHERE f.ID IN (list of values)
EXEC sp_dropserver 'SRV', 'droplogins'

Is there a syntax error with this SQL server query? Can I not use "target.#1"?

Hard coded, this works:
var insertCommand1 = ("MERGE INTO Leaderboard WITH (HOLDLOCK) AS target USING (SELECT * FROM Scores WHERE WeekNumber = 7) AS Source ON (target.id = source.id) WHEN MATCHED THEN UPDATE SET target.Id = source.Id, target.Week7 = source.weeklyScore WHEN NOT MATCHED THEN INSERT (Id, Week7) VALUES (source.Id, source.weeklyScore);");
db.Execute(insertCommand1);
This does not work:
var insertCommand1 = ("MERGE INTO Leaderboard WITH (HOLDLOCK) AS target USING (SELECT * FROM Scores WHERE WeekNumber = #0) AS Source ON (target.id = source.id) WHEN MATCHED THEN UPDATE SET target.Id = source.Id, target.#1 = source.weeklyScore WHEN NOT MATCHED THEN INSERT (Id, #2) VALUES (source.Id, source.weeklyScore);");
db.Execute(insertCommand1, weeknum, weekstring, weekstring);
The error says there's a syntax error near #1. What could this be?
I've already debugged to make sure the value to weeknum and weekstring were correct.
Working in SQL server on VS 2015.
Schema for the 2 tables-
Leaderboard(Id, Week1, Week2, Week3, Week4, Week5,
Week6, Week7, Week8, Week9, Week10)
with Id as the primary key
Scores(Id, WeekNumber, weeklyScore)
with Id and WeekNumber as the primary key
You are trying to set the fieldname using a parameter, and #parameters are for values.
, target.#1 = source.weeklyScore
Should be
, target.something = #1
It looks like you're trying to use a parameter as a schema object name instead of a value. This doesn't work, as you've discovered. Parameters are just for values.
If you need a dynamic schema object name, be aware of two things:
It could impact performance, though probably not by much.
SQL injection is a significant concern.
The first one you can measure if it becomes a problem, but I doubt it will. The second one can be handled just by being careful. The simple rule with SQL injection is not to always use parameters for everything, but to never execute user-modified values as code.
For schema objects, you already have a finite set of possible values. So you can build a list of known values in your code. This isn't user-modified, so it's safe. (Maybe it's hard-coded, maybe you auto-generate it from the DB schema, that's up to you.)
With the variable, check if it matches a value in the list. If it doesn't, that's an error and the code should simply raise the appropriate exception or in some other way handle that error case. If it does match an element from the known finite list of safe non-user-modifiable values, use that matched value from the list in your query:
var query = string.Format("SELECT SomeTable.{0} FROM SomeTable ...", knownList[x]);
(Or however you want to structure it, hopefully you get the idea.)
Then with that dynamically generated query, you can add your parameter values and you're all set.

Getting the inserted rows to update another table

I have a table which stores records which require to be inserted into another database. Once these values are inserted, I then need to mark these records as processed to prevent them being re-processed.
DECLARE #InsertedValues TABLE (
[ITEMNMBR] nchar(31),
[ITEMDESC] nchar(101),
[ITMSHNAM] nchar(15),
[ITMGEDSC] nchar(11),
[UOMSCHDL] nchar(11),
[ALTITEM1] nchar(31),
[ALTITEM2] nchar(31),
[USCATVLS_1] nchar(11),
[USCATVLS_2] nchar(11),
[USCATVLS_3] nchar(11),
[USCATVLS_6] nchar(11),
[ABCCODE] int,
[ROW_ID] int
)
-- INSERT NEW INVENTORY ITEMS INTO DB
INSERT INTO TABLE1..IV00101 (ITEMNMBR,ITEMDESC,ITMSHNAM,ITMGEDSC,UOMSCHDL,ALTITEM1,ALTITEM2,USCATVLS_1,USCATVLS_2,USCATVLS_3,USCATVLS_6,ABCCODE)
OUTPUT
INSERTED.[ITEMNMBR],
INSERTED.[ITEMDESC],
INSERTED.[ITMSHNAM],
INSERTED.[ITMGEDSC],
INSERTED.[UOMSCHDL],
INSERTED.[ALTITEM1],
INSERTED.[ALTITEM2],
INSERTED.[USCATVLS_1],
INSERTED.[USCATVLS_2],
INSERTED.[USCATVLS_3],
INSERTED.[USCATVLS_6],
INSERTED.[ABCCODE],
U.[ROW_ID] INTO #InsertedValues
SELECT U.[ITEMNMBR],U.[ITEMDESC],U.[ITMSHNAM],U.[ITMGEDSC],U.[UOMSCHDL],U.[ALTITEM1],U.[ALTITEM2],U.[USCATVLS_1],U.[USCATVLS_2],U.[USCATVLS_3],U.[USCATVLS_6],U.[ABCCODE]
FROM
DYNAMICS..TABLE2 AS U
WHERE
U.[ProcessedFlag] = 0 AND
U.[Action] = 'I' AND
U.[DestinationCompany] = 'COMPANY1' AND
U.[DestinationTable] = 'IV00101'
As it stands currently, this query doesn't work as it complains about the U.[ROW_ID] column in the OUTPUT statement which makes sense. So my problem is, how do I get the row that was inserted so that I can then run the following query?
UPDATE DYNAMICS..TABLE2
SET [ProcessedFlag] = 1, [ProcessedDateTime] = GETDATE()
FROM #InsertedValues AS U
INNER JOIN DYNAMICS..TABLE2 AS R ON U.[ROW_ID] = R.[ROW_ID]
I'd consider using eConnect, since messing with GP tables is not a good idea (though inserting into IV00101 should be OK since it's inventory master ... but still!)
What version of GP are you using? GP10 and GP2010 support webservices which have methods which allow you to insert an inventory item, otherwise you can use eConnect and provide XML files to the eConnect entrypoint which it will process. It also provides validation and error handling. You can use message queuing too if needs be
Are you trying to do an import from your own holding table into the GP tables or something like that?
I do plenty of GP and integration where I work :)
It's not possible to get the number of updated rows with standard SQL, but probably any database allows to do it. Bit it won't be that easy to help you, if you don't tell what RDBMS are you using and where are you calling the SQL instructions from. I mean a script executed on what db client application or an application you're developing in T-SQL, PL-SQL, pgplsql, java, PHP, c/c++, c#, VB or whatever language you should say, probably using a db-library you should also say.
UPDATE DYNAMICS..TABLE2
SET [ProcessedFlag] = 1, [ProcessedDateTime] = GETDATE()
WHERE
DYNAMICS..TABLE2.[ProcessedFlag] = 0 AND
DYNAMICS..TABLE2.[Action] = 'I' AND
DYNAMICS..TABLE2.[DestinationCompany] = 'COMPANY1' AND
DYNAMICS..TABLE2.[DestinationTable] = 'IV00101'
Just update the same set of records you selected in the first place.
just a suggestion. You should use identity columns when you have similar kind of scenarios. Since after that use of ##IDENTITY/SCOPE_IDENTITY() becomes pretty easy.
Anyways, I'll suggest you to use trigger if this table doesn't have multiple insertions simultaneously as triggers have a few disadvantages of there own.

Operation must use an updatable query. (Error 3073)

I have written this query:
UPDATE tbl_stock1 SET
tbl_stock1.weight1 = (
select (b.weight1 - c.weight_in_gram) as temp
from
tbl_stock1 as b,
tbl_sales_item as c
where
b.item_submodel_id = c.item_submodel_id
and b.item_submodel_id = tbl_stock1.item_submodel_id
and b.status <> 'D'
and c.status <> 'D'
),
tbl_stock1.qty1 = (
select (b.qty1 - c.qty) as temp1
from
tbl_stock1 as b,
tbl_sales_item as c
where
b.item_submodel_id = c.item_submodel_id
and b.item_submodel_id = tbl_stock1.item_submodel_id
and b.status <> 'D'
and c.status <> 'D'
)
WHERE
tbl_stock1.item_submodel_id = 'ISUBM/1'
and tbl_stock1.status <> 'D';
I got this error message:
Operation must use an updatable query. (Error 3073) Microsoft Access
But if I run the same query in SQL Server it will be executed.
Thanks,
dinesh
I'm quite sure the JET DB Engine treats any query with a subquery as non-updateable. This is most likely the reason for the error and, thus, you'll need to rework the logic and avoid the subqueries.
As a test, you might also try to remove the calculation (the subtraction) being performed in each of the two subqueries. This calculation may not be playing nicely with the update as well.
Consider this very simple UPDATE statement using Northwind:
UPDATE Categories
SET Description = (
SELECT DISTINCT 'Anything'
FROM Employees
);
It fails with the error 'Operation must use an updateable query'.
The Access database engine simple does not support the SQL-92 syntax using a scalar subquery in the SET clause.
The Access database engine has its own proprietary UPDATE..JOIN..SET syntax but is unsafe because, unlike a scalar subquery, it doesn’t require values to be unambiguous. If values are ambiguous then the engine silent 'picks' one arbitrarily and it is hard (if not impossible) to predict which one will be applied even if you were aware of the problem.
For example, consider the existing Categories table in Northwind and the following daft (non-)table as a target for an update (daft but simple to demonstrate the problem clearly):
CREATE TABLE BadCategories
(
CategoryID INTEGER NOT NULL,
CategoryName NVARCHAR(15) NOT NULL
)
;
INSERT INTO BadCategories (CategoryID, CategoryName)
VALUES (1, 'This one...?')
;
INSERT INTO BadCategories (CategoryID, CategoryName)
VALUES (1, '...or this one?')
;
Now for the UPDATE:
UPDATE Categories
INNER JOIN (
SELECT T1.CategoryID, T1.CategoryName
FROM Categories AS T1
UNION ALL
SELECT 9 - T2.CategoryID, T2.CategoryName
FROM Categories AS T2
) AS DT1
ON DT1.CategoryID = Categories.CategoryID
SET Categories.CategoryName = DT1.CategoryName;
When I run this I'm told that two rows have been updated, funny because there's only one matching row in the Categories table. The result is that the Categories table with CategoryID now has the '...or this one?' value. I suspect it has been a race to see which value gets written to the table last.
The SQL-92 scalar subquery is verbose when there are multiple clauses in the SET and/or the WHERE clause matches the SET's clauses but at least it eliminates ambiguity (plus a decent optimizer should be able to detects that the subqueries are close matches). The SQL-99 Standard introduced MERGE which can be used to eliminate the aforementioned repetition but needless to say Access doesn't support that either.
The Access database engine's lack of support for the SQL-92 scalar subquery syntax is for me its worst 'design feature' (read 'bug').
Also note the Access database engine's proprietary UPDATE..JOIN..SET syntax cannot anyhow be used with set functions ('totals queries' in Access-speak). See Update Query Based on Totals Query Fails.
Keep in mind that if you copy over a query that originally had queries or summary queries as part of the query, even though you delete those queries and only have linked tables, the query will (mistakenly) act like it still has non-updateable fields and will give you this error. You just simply re-create the query as you want it but it is an insidious little glitch.
You are updating weight1 and qty1 with values that are in turn derived from weight1 and qty1 (respectively). That's why MS-Access is choking on the update. It's probably also doing some optimisation in the background.
The way I would get around this is to dump the calculations into a temporary table, and then update the first table from the temporary table.
There is no error in the code. But the error is Thrown because of the following reason.
Please check weather you have given Read-write permission to MS-Access database file.
The Database file where it is stored (say in Folder1) is read-only..?
suppose you are stored the database (MS-Access file) in read only folder, while running your application the connection is not force-fully opened. Hence change the file permission / its containing folder permission like in C:\Program files all most all c drive files been set read-only so changing this permission solves this Problem.
In the query properties, try changing the Recordset Type to Dynaset (Inconsistent Updates)