SQL Server CDC Invalid column name __$command_id

SQL Server CDC Invalid column name __$command_id - sql

I have enabled CDC in a source database and created following packages.
Initial Load(CDC start -->Data flow ---> CDC end)
Incremental load(CDC start(get processing range) --> data flow -->cdc end (mark processing end)
These package run perfectly fine when i am running manually, but I am getting the following error message while running thru a scheduled job.
Data Flow Task:Error: "Problems when trying to get changed records
from dbo_AddonQuote. Reson-Invald column name '__&command_id"
Here is the cdc state value
ILUPDATE/CS/0x0000053600005CFD0002/CE/0x000005360000604F0004/IR/0x0000053600005CFD0002/0x0000053600005D140002/TS/2018-03-22T23:10:22.5173580/
As I told before this is not happening while I run manually.
Can anyone shed some light on whats happening here? or how to debug this issue?

I was the one who asked MS to add that column because I discovered a few CDC bugs. They added that column, but they did it in incorrect / inconsistent way.
Recently they released new CUs to fix a few CDC bugs, one was (likely) for your issue. Download the latest CU for your version or/and try to execute
sp_cdc_vupgrade
against the database enabled for CDC.
Before that, check
if your capture instance (cdc.dbo_AddonQuote_CT) has that column (__$command_id)
if CDC stored procedures ([cdc].[sp_batchinsert_xxxxx]) refer that column
if CDC functions ([cdc].[fn_cdc_get_net_changes_dbo_xxxxx) refer that column
BTW. We don't use SSIS CDC Data flow. It's better to create own solution. MS CDC get net changes functions are very slow in certain scenarios and in certain scenarios they return incorrect results. If you create your own methods to read the data from capture instances, it will be more reliable and faster.

The following change fixed my issue.
My old package
Get Processing --> Data flow --> Mark processed
Updated Package
CDC start --> Get processing --> Data flow --> Mark processed
Here is the reference link which explains about the issue
http://www.bradleyschacht.com/understanding-the-cdc-state-value/

None of the above solutions worked for me.
I'm using MS SQL 2017 with latest CU15.
I the end I didn't use CDC start in SSIS but I found official solution in Microsoft DWH certification materials which is also working ok only if you're not using CDC start.
Following example shows how to change:
cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes in order to avoid such problems in SSIS.
You need to chang __$command_id column to NULL for every value.
Here it is for [HumanResources] demo database with only one table called Employee. No other changes needed. Just run this for your db/table and your CDC in SSIS will work without any other modifications needed.
USE [HumanResources]
GO
EXEC sp_rename N'cdc.fn_cdc_get_all_changes_dbo_Employee', N'fn_cdc_get_all_changes_dbo_Employee_safe'
GO
CREATE FUNCTION cdc.fn_cdc_get_all_changes_dbo_Employee(
#from_lsn BINARY(10),
#to_lsn BINARY(10),
#row_filter_option NVARCHAR(30))
RETURNS TABLE
RETURN SELECT *, NULL AS __$command_id
FROM cdc.fn_cdc_get_all_changes_dbo_Employee_safe(
#from_lsn,
#to_lsn,
#row_filter_option)
GO
EXEC sp_rename N'cdc.fn_cdc_get_net_changes_dbo_Employee', N'fn_cdc_get_net_changes_dbo_Employee_safe'
GO
CREATE FUNCTION cdc.fn_cdc_get_net_changes_dbo_Employee (
#from_lsn BINARY(10),
#to_lsn BINARY(10),
#row_filter_option NVARCHAR(30))
RETURNS TABLE
RETURN SELECT *, NULL AS __$command_id
FROM cdc.fn_cdc_get_net_changes_dbo_Employee_safe (
#from_lsn,
#to_lsn,
#row_filter_option)
GO

Related

How can I schedule a script in BigQuery?

At last BigQuery supports using ; in the queries, so I can write more than one query in one "block", if I seperate them with semicolon.
If I run the code manually, it works. But I cannot schedule that.
When I want to schedule, I have two choices:
(New) Web UI: I must give a destination table. If I don't do it, I could not save the scheduled query. But all my queries are updates and inserts with different "destination tables". Like these:
UPDATE project.exampledataset.a
SET date = current_date()
WHEN TRUE
;
INSERT INTO project.otherdataset.b
SELECT c,d
FROM project.otherdataset.c
So I cannot even make a scheduling in the Web UI.
Classic UI: I tried this, because the official documentary states, that I should leave the "destination table" blank, and Classic UI allows it. I can setup the scheduling, but it doesn't run, when it should. I get the error message in email "Error status: Dataset specified in the query ('') is not consistent with Destination dataset 'exampledataset'."
AIK scripting (and using semicolon) is a very new feature in BigQuery, but I hope someone can help me.
Yes, I know that I could schedule every query one by one, but I would like to resolve it with one big script.

Looks like the scheduled query was defined earlier with destination dataset defined with APPEND/TRUNCATE type transaction. While updating the same scheduled query to a DML query, GUI doesn't show the dataset field / table name to update to NULL. Hence this error is coming considering the previously set dataset and table name in the scheduled query.
Hence the fix is to delete the scheduled query and create it from scratch with DML query option. It worked for me.

Scripting is supported in scheduled query now. However, scripting query, when being scheduled, doesn't support setting a destination table for now. You still need to use DDL/DML to make change to existing table.
E.g.:
CREATE OR REPLACE TABLE destinationTable AS
SELECT *
FROM sourceTable
WHERE date >= maxDate

As of 2022, the BQ Console UI will let you create a new scheduled query without a destination dataset, but it won't let you update a prior SELECT to use DDL/DML block syntax. However, you can use the BigQuery Data Transfer API to update the destinationDatasetId field, via transferconfigs/patch. Use transferconfigs/list to get the configId for a given scheduled query.
Note that you can either use the in-browser API Explorer, if you have the appropriate credentials, or write a programmatic solution. Also seems useful for setting/updating any other fields, including renaming scheduled queries.

SSIS error in Data Flow Task with Dynamic connection

I have very basic skills in developing SSIS packages; and getting errors while developing this new package. With this package, the SQLInstance is getting determined fine as can be seen in column mapping in the second picture. But it is not reading columns from the columns of a user table (IndexType column, in this case). This is the issue.
Tried below steps with no luck till now:
I set the VaidateExternalMetaData setting to False, still same error.
Already removed all columns one-by-one to identify whether it is issue with some specific data type, still same issue.
Created a brand new test package, same error in test package also.
Another package working fine in production with same settings with a user database. Copied the DataFlowTask component from it and used it new package, still same issue.
Please help. Many thanks.

It may be SQL server version. I had similar issue when using table variables or temp tables. You need to use with result set, similar to this:
EXEC('SELECT 43112609 AS val;')
WITH RESULT SETS
(
(
val VARCHAR(10)
)
);
Article here:
http://www.itprotoday.com/sql-server-2012-t-sql-glance-execute-result-sets
SQL can not tell what is being returned when using temp/table variablbes so you have to specify this. It is needed for some versions of SQL Server.

Pentaho Execute SQL Statements variable conversion to null

I am using PDI to delete and insert some data from a DB. I have the following issue. I create two variables called START_DATE and END_DATE that are used to select the data that will be deleted from my DB. I am able to get them and run my transformation with no erors in the log file, but when I checked if data was deleted, I find it didn't. I send checked my "DeleteProcedure" step, and it says "Conversion error: null". I have tried different approached to take the variables and pass them as Strings, but I haven't been able to solve this issue. It cannot be a SQL mistake as I tested it with a constant and it works.
Any ideas? I attach some pics. Thanks!

As a documentation of the Execute SQL script says:
Note: When you have an issue, that the SQL is started at the initialization phase of the transformation and not for each row, make sure to check the option "Execute for each row" (see description below).
In your case it executes during the initialization phase of the transformation that's why it gets null values instead of ones from previous step.

SQL INSERT sp_cursor Error

I have a pair of linked SQL servers: ServerA and ServerB. I want to write a simple INSERT INTO SELECT statement which will copy a row from ServerA's database to ServerB's database. ServerB's database was copied directly from ServerA's, and so they should have the exact same basic structure (same column names, etc.)
The problem is that when I try to execute the following statement:
INSERT INTO [ServerB].[data_collection].[dbo].[table1]
SELECT * FROM [ServerA].[data_collection].[dbo].[table1]
I get the following error:
Msg 16902, Level 16, State 48, Line 1
sp_cursor: The value of the parameter 'value' is invalid.
On the other hand, if I try to execute the following statement:
INSERT INTO [ServerB].[data_collection].[dbo].[table1] (Time)
SELECT Time FROM [ServerA].[data_collection].[dbo].[table1]
The statement works just fine, and the code is executed as expected. The above statement executes just fine, regardless of which or how many tables I specify to insert.
So my question here is why would my INSERT INTO SELECT statement function properly when I explicitly specify which columns to copy, but not when I tell it to copy everything using "*"? My second question would then be: how do I fix the problem?

Googling around to follow up on my initial hunch, I found a source I consider reliable enough to cite in an answer.
The 'value' parameter specified isn't one of your columns, it is the optional argument to sp_cursor that is called implicitly via your INSERT INTO...SELECT.
From SQL Server Central...
I have an ssis package that needs to populate a sql table with data
from a pipe-delimited text file containing 992 (!) columns per record.
...Initially I'd set up the package to contain a data flow task to use
an ole db destination control where the access mode was set to Table
or view mode. For some reason though, when running the package it
would crash, with an error stating the parameter 'value' was not valid
in the sp_cursor procedure. On setting up a trace in profiler to see
what this control actually does it appears it tries to insert the
records using the sp_cursor procedure. Running the same query in SQL
Server Management Studio gives the same result. After much testing and
pulling of hair out, I've found that by replacing the sp_cursor
statement with an insert statement the record populated fine which
suggests that sp_cursor cannot cope when more than a certain number
of parameters are attempted. Not sure of the figure.
Note the common theme here between your situation and the one cited - a bazillion columns.
That same source offers a workaround as well.
I've managed to get round this problem however by setting the access
mode to be "Table or view - fast load". Viewing the trace again
confirms that SSIS attempts this via a "insert bulk" statement which
loads fine.

Empty XML Columns during SQL Server replication

We have a merge replication setup on SQL Server that goes like this: 1 SQL server at the office, another SQL server traveling around the world. The publisher is the SQL server at the office.
In about 1% of the cases, two of our tables with a column of XML Data type (not bound to a schema) are replicated with rows containing empty XML columns. ( This only happened when data is sent from the "traveling server" back home, but then again, data seems to be changed more often there ). We only have this in prod. environment ( WAN replication ).
Things i have verified:
The row is replicated, as the last modification date on the row is refreshed but the xml column is empty. Of course it is not empty on the other SQL Server.
No conflicts are displayed in the replication conflicts UI.
It is not caused by the size of the data inside the XML Column as some are very small.
Usually, the problem occurs in batch. ( The xml column of 8-9 consecutive rows will be empty )
The problem occurs if a row was inserted OR updated. No pattern there.
The problem seems to occur, but this is pure speculation on my part when the connection is weaker. ( We've seen this problem happen more often when the server was far away as compared to when it was close by. )
Sorry if i have confused some things, I am not really a DBA, more of a DEV with knowledge of SQL but since the application using the database keeps getting blamed for the problems ( the XML column must not be empty!! ) I have taken it at heart to try and find the problem instead of just manually patching the data each time ( Whats the use of replication if you have to do that? )
If anyone could help out with this problem, or at least suggest some ways of being able to debug / investigate this it would be greatly appreciated.
I did search alot on google and I did find this: Hot Fix . But we do have the latest service pack and the problem seems a bit different.
fyi: We have a replication setup locally here but the problem never occurs. We will be trying a WAN simulator on it as well to see if that can help.
Thanks

Edit: hot fix is now available for my issue: http://support.microsoft.com/kb/2591902
After logging this issue with Microsoft, we were able to reproduce the problem without a slow link ( Big thanks to the competent escalation engineer at Microsoft ). The repro is a bit different from our scenario, but highlights the timing issue we were getting perfectly.
Create 2 tables – One parent one child (have a PK-FK relationship)
Insert 2 rows in the parent table
Set up replication – configure merge agent to run ON DEMAND
Sync
Once all is replicated:
On the PUBLISHER: delete one row from the parent table
On the SUBSCRIBER: Insert 2 rows of data that references the parentid you deleted above
Insert 5 rows of data that references the parentid that will stay in the table
Sync, Merge agent will fail, Sync again, Merge agent will succeed
Missing XML data on the publisher on the 5 rows.
Seems it is a bug that is in SQL Server 2005/2008 and 2008R2.
It will be addressed in a hot fix in 2008 and up. ( As SQL Server 2005 is no longer being altered )
Cheers.

You may want to start out by slapping a bandaid on this perplexing situation to buy some time to fully investigate and fix (or more likely get MS to fix it). SQL Data Compare is an excellent tool that might help.

Figured i'd put an update here as this issue got me a few gray hairs and I am somewhat closer to a solution now.
I finally had some time to work on this and managed to reproduce this issue in our test environment, using a WAN simulator and slowing down the link and injecting some random packet loss. ( to best simulate the production environment where the server is overseas on a really bad line ).
After doing some SQL tracing, and some verbose logging here are my conclusions:
When replicating a row with an XML column, the process is done in 2 steps. First an insert is done of the full row but with an empty string for the XML column. Right after, an update is done this time with the XML column having data. Since the link is slow, in some situations a foreign key violation occured.
In this scenario, Table2 depends on Table1. After finishing replicating table1, and starting to replicate table2 (Enumration of insert/updates which takes time on a slow link), some entries were added to table1 and table2. Therefore some inserts on Table2 failed because Table1 entries were not in the database and were only going to be replicated next batch. The next time the replication occured, no more foreign key violations occured, however when it tried to insert the row that had previously failed in Table2 ( XML column row ), the update part of it was missing ( I could see that in the SQL profiler ) and that is why the row ended up after all was done with an empty XML.
Setting "Enforce for replication" to false on the foreign keys seems to address the problem, however I do still think that this whole process should work with the option set to true.
I logged a support call with Microsoft for this. I have sent the traces and logs to Microsoft and will see what they have to say.
I've read this article: http://msdn.microsoft.com/en-us/library/ms152529(v=SQL.90).aspx. But for me, setting this option to false is kind of a work around, no?
What do you guys think?
ps: Hope this is clear, tried to explain it the best I could. English is not my first language.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas