I'm trying to investigate when & why certain rows are getting deleted in a SQL 2005 database. I've started building a trigger to log some information when a row is deleted.
My trigger is activated when row(s) are deleted from a certain table. I have it set up to log a timestamp in another logging table when the delete occurs. I'd also like to log the data that was deleted, but would prefer not to hassle with writing code for each field and value.
I know when data is deleted it can be seen (temporarily) in the "Deleted" table in SQL Server. So right after a delete, I could "SELECT * FROM Deleted" and see the data. I would like to take the contents of this table, and turn it into one large text blob that I can just save into a TEXT field in my logging table.
So... in simpler terms, is there a way I can take a recordset of one or more rows and turn it into a single string variable? all within SQL commands in my trigger? Bonus points if I can include column names.
Thanks
I would stay away from anything that would run too long when working in a trigger. That includes some query just to determine a static table layout (because you don't want to write the code yourself) so you can build a string.
I do this type of thing all the time, but mostly with stored procedure parameters. I have most of this in a template I use.
create this function, will display the column nicely within quotes or show NULL:
CREATE FUNCTION [dbo].[QuoteNull]
(
#InputStr varchar(8000) --value to pad
)
RETURNS
varchar(8000)
AS
/*
TEST WITH:
----------
PRINT ' dbo.QuoteNull(null) ->'+dbo.QuoteNull(null)+'<-'
PRINT ' dbo.QuoteNull(''apple'') ->'+dbo.QuoteNull('apple')+'<-'
PRINT ' dbo.QuoteNull(123) ->'+dbo.QuoteNull(123)+'<-'
PRINT ' dbo.QuoteNull(GETDATE()) ->'+dbo.QuoteNull(GETDATE())+'<-'
PRINT ' dbo.QuoteNull(GETDATE()) ->'+dbo.QuoteNull(CONVERT(varchar(23),GETDATE(),121))+'<-'
*/
BEGIN
RETURN COALESCE(''''+#InputStr+'''','null')
END
GO
paste this into your code:
INSERT INTO YourLogTable
(xxx,yyy,zzz,ColumnTextValue)
SELECT
xxx,yyy,zzz,'values:'
+' '+RTRIM('ColumnNameInt ')+'='+dbo.QuoteNull( ColumnNameInt )
+', '+RTRIM('ColumnNameVarchar ')+'='+dbo.QuoteNull( ColumnNameVarchar )
+', '+RTRIM('ColumnNameChar ')+'='+dbo.QuoteNull( ColumnNameChar )
+', '+RTRIM('ColumnNameDate ')+'='+dbo.QuoteNull(CONVERT(varchar(23),ColumnNameDate ,121))
FROM DELETED
make sure you have one row for each column in your table (if you have more just delete the extra ones later), if you want to see any dates in detail, use the convert as shown above.
run this query:
select sc.name
FROM syscolumns sc INNER JOIN sysobjects so ON sc.id = so.id
where UPPER(so.name)=UPPER('YourTableName') order by sc.colorder desc
take the output in SQL Server Management Studio (within text output mode), and do ALT-LEFT_Click-Drag a square over the column names, and copy this column based selection.
Go back to your code and ALT-Click-Drag a square over the complete "ColumnName..." values in the left column of the insert statement and paste. If you made a column selection, it will replace the column only and leave the code unchanged to the left and right. Do the same thing for the "ColumnName..." values in the right of the insert and you now have an INSERT that will build the data you want but will not waste too much time in the trigger.
Related
With GDPR in the UK on the looming horizon and already have a team of 15 users creating spurious SELECT statements (in excess of 2,000) across 15 differing databases I need to be able to create a method to capture an already created SELECT statement and be able to assign surrogate keys/data WITHOUT rewriting every procedure we already have.
There will be a need to run the original team members script as normal and there will be requirements to pseudo the values.
My current thinking is to create a stored procedure along the lines of:
CREATE PROC Pseudo (#query NVARCHAR(MAX))
INSERT INTO #TEMP FROM #query
Do something with the data via a mapping table of real and surrogate/pseudo data.
UPDATE #TEMP
SET FNAME = (SELECT Pseudo_FNAME FROM PseudoTable PT WHERE #TEMP.FNAME = PT.FNAME)
SELECT * FROM #TEMP
So that team members can run their normal SELECT statements and get pseudo data simply by using:
EXEC Pseudo (SELECT FNAME FROM CUSTOMERS)
The problem I'm having is you can't use:
INSERT INTO #TEMP FROM #query
So I tried via CTE:
WITH TEMP AS (#query)
..but I can't use that either.
Surely there's a way of capturing the recordset from an existing select that I can pull into a table to amend it or capture the SELECT statement; without having to amend the original script. Please bear in mind that each SELECT statement will be unique so I can't write COLUMN or VALUES etc.
Does any anyone have any ideas or a working example(s) on how to best tackle this?
There are other lengthy methods I could externally do to carry this out but I'm trying to resolve this within SQL if possible.
So after a bit of deliberation I resolved it.
I passed the Original SELECT SQL to SP that used some SQL Injection, which when executed INSERTed data. I then Updated from that dataset.
The end result was "EXEC Pseudo(' Orginal SQL ;')
I will have to set some basic rules around certain columns for now as a short term fix..but at least users can create NonPseudo and Pseudo data as required without masses of reworking :)
I am using SSIS in VS 2013.
I need to get a list of IDs from 1 database, and with that list of IDs, I want to query another database, ie SELECT ... from MySecondDB WHERE ID IN ({list of IDs from MyFirstDB}).
There is 3 Methods to achieve this:
1st method - Using Lookup Transformation
First you have to add a Lookup Transformation like #TheEsisia answered but there are more requirements:
In the Lookup you Have to write the query that contains the ID list (ex: SELECT ID From MyFirstDB WHERE ...)
At least you have to select one column from the lookup table
These will not filter rows , but this will add values from the second table
To filter rows WHERE ID IN ({list of IDs from MyFirstDB}) you have to do some work in the look up error output Error case there are 2 ways:
set Error handling to Ignore Row so the added columns (from lookup) values will be null , so you have to add a Conditional split that filter rows having values equal NULL.
Assuming that you have chosen col1 as lookup column so you have to use a similar expression
ISNULL([col1]) == False
Or you can set Error handling to Redirect Row, so all rows will be sent to the error output row, which may not be used, so data will be filtered
The disadvantage of this method is that all data is loaded and filtered during execution.
Also if working on network filtering is done on local machine (2nd method on server) after all data is loaded is memory.
2nd method - Using Script Task
To avoid loading all data, you can do a workaround, You can achieve this using a Script Task: (answer writen in VB.NET)
Assuming that the connection manager name is TestAdo and "Select [ID] FROM dbo.MyTable" is the query to get the list of id's , and User::MyVariableList is the variable you want to store the list of id's
Note: This code will read the connection from the connection manager
Public Sub Main()
Dim lst As New Collections.Generic.List(Of String)
Dim myADONETConnection As SqlClient.SqlConnection
myADONETConnection = _
DirectCast(Dts.Connections("TestAdo").AcquireConnection(Dts.Transaction), _
SqlClient.SqlConnection)
If myADONETConnection.State = ConnectionState.Closed Then
myADONETConnection.Open()
End If
Dim myADONETCommand As New SqlClient.SqlCommand("Select [ID] FROM dbo.MyTable", myADONETConnection)
Dim dr As SqlClient.SqlDataReader
dr = myADONETCommand.ExecuteReader
While dr.Read
lst.Add(dr(0).ToString)
End While
Dts.Variables.Item("User::MyVariableList").Value = "SELECT ... FROM ... WHERE ID IN(" & String.Join(",", lst) & ")"
Dts.TaskResult = ScriptResults.Success
End Sub
And the User::MyVariableList should be used as source (Sql command in a variable)
3rd method - Using Execute Sql Task
Similar to the second method but this will build the IN clause using an Execute SQL Task then using the whole query as OLEDB Source,
Just add an Execute SQL Task before the DataFlow Task
Set ResultSet property to single
Select User::MyVariableList as Result Set
Use the following SQL command
DECLARE #str AS VARCHAR(4000)
SET #str = ''
SELECT #str = #str + CAST([ID] AS VARCHAR(255)) + ','
FROM dbo.MyTable
SET #str = 'SELECT * FROM MySecondDB WHERE ID IN (' + SUBSTRING(#str,1,LEN(#str) - 1) + ')'
SELECT #str
If the column has string data type you should add quotation before and after values as below:
SELECT #str = #str + '''' + CAST([ID] AS VARCHAR(255)) + ''','
FROM dbo.MyTable
Make sure that you have set the DataFlow Task Delay Validation property to True
This is a classic case for using LookUp Transformation. First, use a OLE DB Source to get data from the first database. Then, use a LookUp Transformation to filter this data-set based on the ID values from the second data-set. Here is the steps for using a LookUp Transformation:
In the General tab, select Full Cash, OLE DB Connection Manager and Redirect rows to no match output as shown in the following picture. Notice that using Full Cash provides great performance for your package.
General Setting
In the Connection tab, use OLE DB Connection Manager to connect to your second server. Then, you can either directly select the data-set with ID values or (as is shown in the picture below) you can use SQL code to select the IDs from the filtering data-set.
Connection:
Go to Columns tab and select ID columns from the both datasets. For each record from your first data-set, it will check to see if its ID is in the Available LookUp Column. If it is, it will go to the Matching output, else to No Matching output.
Match ID columns:
Click on OK to close the LookUp. Then you need to select the LookUp Match Output.
Match Output:
The "best" answer depends on data volumes and source systems involved.
Many of the other answers propose building out a list of values based on clever concatenation within SQL Server. That doesn't work so well if the referenced system is Oracle, MySQL, DB2, Informix, PostGres, etc. There may be an equivalent concept but there might not be.
For best performance, you need to filter against the second db before any of those rows ever hit the data flow. That means adding a filtering condition, as the others have suggested, to your source query. The challenge with this approach is that your query is going to be limited by some practical bounds that I don't remember. Ten, one hundred, a thousand values in your where clause is probably fine. A lakh, a million - probably not so much.
In the cases where you have large volumes of values to filter against the source table, it can make sense to create a table on that server and truncate and reload that table (execute sql task + data flow). This allows you to have all of the data local and then you can index the filter table and let the database engine do what it's really good at.
But, you say the source database is some custom solution that you can't make tables in. You can look at the above approach with temporary tables and within SSIS you just need to mark the connection as singleton/persisted (TODO: look this up). I don't much care for temporary tables with SSIS as debugging them is a nightmare I'd not wish upon my mortal enemy.
If you're still reading, we've identified why filtering in the source system might not be "doable", even if it will provide the best performance.
Now we're stuck with purely SSIS solutions. To get the best performance, do not select the table name in the drop down - unless you absolutely need every column. Also, pay attention to your data types. Pulling LOB (XML, text, image (n)varchar(max), varbinary(max)) into the dataflow is a recipe for bad performance.
The default suggestion is to use a Lookup Component to filter the data within the data flow. As long as your source system supports and OLE DB provider (or you can coerce the data into a Cache Connection Manager)
If you can't use a Lookup component for some reason, then you can explicitly sort your data in your source systems, mark your source components as such, and then use a Merge Join of type Inner Join in the data flow to only bring in matched data.
However, be aware that sorts in source systems are going to be sorted according to native rules. I ran into a situation where SQL Server was sorting based on the default ASCII sort and my DB2 instance, running on zOS, provided an EBCDIC sort. Which was great when my domain was only integers but went to hell in a handbasket when the keys became alphanumeric (AAA, A2B, and AZZ will sort differently based on this).
Finally, excluding the final paragraph, the above assumes you have integers. If you're performing string matching, you get an extra level of ugliness because different components may or may not perform a case sensitive match (sorting with case sensitive systems can also be a factor).
I would first create a String variable e.g. SQL_Select, at the Scope of the Package. Then I would assign that a value using an Execute SQL Task against the 1st database. The ResultSet property on the General page should be set to Single row. Add an entry to the Result Set tab to assign it to your Variable.
The SQL Statement used needs to be designed to return the required SELECT statement for your 2nd database, in a single row of text. An example is shown below:
SELECT
'SELECT * from MySecondDB WHERE ID IN ( '
+ STUFF ( (
SELECT TOP 5
' , ''' + [name] + ''''
FROM dbo.spt_values
FOR XML PATH(''), TYPE).value('(./text())[1]', 'VARCHAR(4000)'
) , 1 , 3, '' )
+ ' ) '
AS SQL_Select
Remove the TOP 5 and replace [name] and dbo.spt_values with your column and table names.
Then you can use the variable SQL_Select in a downstream task e.g. an OLE DB Source against database 2. OLE DB Sources and OLE DB Command Tasks both let you specify a Variable as the SQL Statement source.
You could add a LinkedServer between the two servers. The SQL command would be something like this:
EXEC sp_addlinkedserver #server='SRV' --or any name you want
EXEC sp_addlinkedsrvlogin 'SRV', 'false', null, 'username', 'password'
SELECT * FROM SRV.CatalogNameInSecondDB.dbo.SecondDBTableName s
INNER JOIN FirstDBTableName f on s.ID = f.ID
WHERE f.ID IN (list of values)
EXEC sp_dropserver 'SRV', 'droplogins'
I have a process that builds reports based upon dynamic SQL queries stored in tables. When I originally wrote it as a proof-of-concept it was able to successfully work when using a cursor style process...was originally actually done as a script in the proof, using Do/While - the "proof" was moved to tSQL initially in the same format and successful, other than the fact that it ran like crap because it was iterating 1 record at a time.
I rewrote the process to leverage the point of using SQL - mass select/manipulation of records...but I haven't been able to get the calculation grab to work in this manner and have just been using statically written case statements.
Tables:
Items list - just a friendly label for what each item is.
SourceQuery - nvarchar fields containing actual SQL select statements
Calculations - varchar fields containing data such as DateAdd(m,-1,CalcDate) and DATEADD(month, DATEDIFF(month, 0, CalcDate), 0) and a lot of other calculations based upon other values. (CalcDate is a select value which is going into the TempTable currently)
The dynamic execution takes the SourceQuery, builds, then executes it into a temp table:
DECLARE #SourceQuery Nvarchar(max)
create table #TempTable...
select distinct #SourceQuery=SourceQuery from vewTaskCombo where ....
Set #SourceQuery = 'Insert into #TempTable...' + #SourceQuery
Execute (#SourceQuery)
The above is doing the SourceQuery into the temp table but not currently doing anything with calculations - as mentioned that is currently done by an update statement using a CASE statement to decide which Date Calculation to use.
What I would like to do is eliminate the CASE statement and allow it to grab the calculation directly from the table. When doing this as a single item iteration it was fine because we could assign the calculation to a variable.
The above is just a snippet of the pieces - there are several other table elements that are all joined together to create the query and decide the calculations.
Edit response:
the issue I am having is how to get the calculation from the table to execute as a statement. So for example if I inner join the calculation table in I can grab what type of calculation it should be (DateAdd....) but it is grabbing it only as a varchar and no longer able to execute it as calculation. Before because it was iterating 1 at a time the current calculation was grabbed into a variable and executed that way. But now because I am doing it all in bulk. I can insert the formula into the temp table as another value but can't figure out how to get it to execute it as a calculation.
The goal is to execute the calculation that is stored in the table. I can select the calculation into the temp table but can't figure out how to execute it as a calculation without putting it into a separate variable - and since there can be more than one calculation I can't just assign it to a single variable (without putting in a cursor to go through each calculation one at a time, which I am trying to avoid doing).
Currently the statically written case statements look something like:
Update #TempTable
Set StartDate = CASE WHEN TaskThresholdID=2 then
DateAdd(m,-1,CalcDate)
WHEN TaskThresholdID=4 then DateAdd(m,-1,CalcDate)
.
.
.
DueDate = CASE WHEN TaskThresholdID=2 then DateAdd("D",4,CalcDate)
WHEN TaskThresholdID=4 then CalcDate
.
.
.
The goal is to grab that calculation from the table and not have it statically written into the procedure.
And thank you LukStorms for code formatting edit.
I ended up finding a solution to it after trying a few other ideas. Initially tried using a more formalized equation, while we did eventually get that to work the problem was adding "30" or "31" days in translation for a month was too inaccurate.
What I ended up doing was building the dynamic queries into a table (#UpdateQuery_Temp) then using COALESCE in order to get those into a single query. And finally EXECUTE that individual query.
Create Table #UpdateQuery_Temp (TaskThresholdID int, UpdateQuery varchar(max))
insert into #UpdateQuery_Temp
Select Distinct ThresholdID, ('update #TempTable set StartDate=' + StartCalculation + ',DueDate=' + DueCalculation + ' where ThresholdID=' + cast(ThresholdID as varchar(2)) + ' ') FROM #TempTable
DECLARE #UpdateQuery varchar(max)
SELECT #UpdateQuery = COALESCE(#UpdateQuery + ' ',' ') + UpdateQuery + ';'
FROM #UpdateQuery_Temp
EXECUTE (#UpdateQuery)
Using this format no matter how many combinations of Start/Due calculations there are it can dynamically grab those from the table and execute them. The execution time on 13,000 records took a small hit of 0.01 seconds - production tables are a few million records but the time difference is small enough that it is worth the hit to get it back to being table driven.
I'm connecting to a DB2 database and executing SQL statements.
One example of what is being done is:
select field from library/file
[program code line finishes executing]
[increment value by one]
update library/file set field = 'incremented value'
I have a need to immediately update the value while returning the value. Rather than having to wait for the script to complete, and then run a separate UPDATE statement.
The concept of what I would like to do is this:
select field from library/file; update library/file set field = (Current Value + 1); go;
Please note... this is not the common SQL database most would be familiar with, it is a DB2 database on an IBM i.
Thanks!
Consider using a DB2 SEQUENCE to manage the next available number, if this file is simply intended to have a single row storing your counter. That is what a SEQUENCE is designed to do.
To set it up, use a CREATE SEQUENCE statement.
To increment the value and retrieve, use a SEQUENCE reference expression of the form NEXT VALUE FOR sequence-name. To find out what the most recent value was, use the PREVIOUS VALUE FOR sequence-name. These expressions can be used like a regular any column expression, such as in a SELECT or INSERT statement.
Suppose, for example you want to do this for invoice numbers (and maybe your accounting department doesn't want their first invoice number to be 000001, so we will initialize it higher).
CREATE SEQUENCE InvoiceSeq
as decimal (7,0)
start with 27000; -- for example
You could get a number for a new invoice like this:
SELECT NEXT VALUE FOR InvoiceSeq
INTO :myvar
FROM SYSIBM/SYSDUMMY1;
But what is this SYSIBM/SYSDUMMY1 table? We're not really getting anything from table, so why are we pretending to do so? The SELECT needs a FROM-table clause. But since we don't need one, let's use a VALUES INTO statement.
VALUES NEXT VALUE FOR InvoiceSeq
INTO :myvar;
So that has incremented the counter, and put the value into our variable. You could use that value to INSERT into our InvoiceHeaders and InvoiceDetails tables.
Or, you could increment the counter as you write an InvoiceHeader, then use it again when writing the InvoiceDetails.
INSERT INTO InvoiceHeaders
(InvoiceNbr, Customer, InvoiceDate)
VALUES (NEXT VALUE FOR InvoiceSeq, :custnbr, :invdate);
for each invoice detail
INSERT INTO InvoiceDetails
(InvoiceNbr, InvoiceLine, Reason, Fee)
VALUES (PREVIOUS VALUE FOR InvoiceSeq, :line, :itemtxt, :amt);
The PREVIOUS VALUE is local to the particular job, so there should be no risk of another job getting the same number.
update library/file set field = field + 1;
select field from library/file;
[program code line finishes executing]
[increment value by one]
This handles the problem of another app updating the number between the time you fetch it and the time you update it. Update it and then use it. If two apps try to update simultaneously, one will wait.
A SEQUENCE object is designed exactly for this purpose, but if you are forced to keep this 'next ID' file updated, this is how I'd do it. Follow the link in the comment by #Clockwork-Muse for info on the SEQUENCE object, or try this example from V5R4.
His request is like this:
UPDATE sometable
SET somecounter = somecounter + 10,
:returnvar = somecounter + 10;
Updates and retrieves at the same time.
This is possible in MSSQL, In fact I use it alot there,
but DB2 doesnt seem to have this feature.
table on external database (when I click modify) states that row A is a varchar(10) but when I look at the data there is obviously many more characters in it. How is this possible?
This concerns me because when I pull data from that row, I only get 10 characters, and the rest is cut off. I am not allowed to modify the external database tables.
How is this possible?
The column was probably originally a varchar(30) and was subsequently altered to varchar(10). I assume data has been written since the change to varchar(10), which makes this a true mess. If altering the column back to a length of 30 is not possible, I would investigate the implications of truncating the old data to 10 characters.
Update
run the following statement to confirm the column length:
select character_maximum_length
from information_schema.columns
where table_name='tablename' and COLUMN_NAME='columnname'
Update 2:
select max(len(column_name))
from tablename