Cdc and how long are logs kept

Cdc and how long are logs kept - sql

I started using SQL capture data change tables on Microsoft SQL Server 2016, it looks fairly easy to use mechanism, but now when I was using some tutorial I found and there was n info about that there is a limited time that data is kept in those tables, I think default is 3 days.
I was trying to find some info about it but with no luck so my questions stands:
Is there a way to increase that time that logs are kept or even turn it off.

You are looking for the Retention Period, which is indeed 3 days by default.
You can change it using sys.sp_cdc_change_job
USE [YourDatabase];
EXECUTE sys.sp_cdc_change_job
#job_type = N'cleanup',
#retention = 2880;
[ #retention ] =retention Number of minutes that change rows are to be
retained in change tables. retention is bigint with a default of NULL,
which indicates no change for this parameter. The maximum value is
52494800 (100 years). If specified, the value must be a positive
integer. retention is valid only for cleanup jobs.
Please note, that this affects ALL tables marked to be tracked by CDC in the database, there is no way to configure it per table.
https://msdn.microsoft.com/en-us/library/bb510748(v=sql.105).aspx

Related

Finding statistical outliers in timestamp intervals with SQL Server

We have a bunch of devices in the field (various customer sites) that "call home" at regular intervals, configurable at the device but defaulting to 4 hours.
I have a view in SQL Server that displays the following information in descending chronological order:
DeviceInstanceId uniqueidentifier not null
AccountId int not null
CheckinTimestamp datetimeoffset(7) not null
SoftwareVersion string not null
Each time the device checks in, it will report its id and current software version which we store in a SQL Server db.
Some of these devices are in places with flaky network connectivity, which obviously prevents them from operating properly. There are also a bunch in datacenters where administrators regularly forget about it and change firewall/ proxy settings, accidentally preventing outbound communication for the device. We need to proactively identify this bad connectivity so we can start investigating the issue before finding out from an unhappy customer... because even if the problem is 99% certainly on their end, they tend to feel (and as far as we are concerned, correctly) that we should know about it and be bringing it to their attention rather than vice-versa.
I am trying to come up with a way to query all distinct DeviceInstanceId that have currently not checked in for a period of 150% their normal check-in interval. For example, let's say device 87C92D22-6C31-4091-8985-AA6877AD9B40 has, for the last 1000 checkins, checked in every 4 hours or so (give or take a few seconds)... but the last time it checked in was just a little over 6 hours ago now. This is information I would like to highlight for immediate review, along with device E117C276-9DF8-431F-A1D2-7EB7812A8350 which normally checks in every 2 hours, but it's been a little over 3 hours since the last check-in.
It seems relatively straightforward to brute-force this, looping through all the devices, examining the average interval between check-ins, seeing what the last check-in was, comparing that to current time, etc... but there's thousands of these, and the device count grows larger every day. I need an efficient query to quickly generate this list of uncommunicative devices at least every hour... I just can't picture how to write that query.
Can someone help me with this? Maybe point me in the right direction? Thanks.

I am trying to come up with a way to query all distinct DeviceInstanceId that have currently not checked in for a period of 150% their normal check-in interval.
I think you can do:
select *
from (select DeviceInstanceId,
datediff(second, min(CheckinTimestamp), max(CheckinTimestamp)) / nullif(count(*) - 1, 0) as avg_secs,
max(CheckinTimestamp) as max_CheckinTimestamp
from t
group by DeviceInstanceId
) t
where max_CheckinTimestamp < dateadd(second, - avg_secs * 1.5, getdate());

NUMBER(p, s) Data Type: Microsoft SQL Server

So, let me start by saying I have had a hard time finding any documentation about this online - hence I am asking, here. I am having to manually calculate the size of a row in Microsoft SQL Server 2008 here at work (I know this can be done via a query; however, do to some hardware issues, it is not presently possible). Either way, I figured this question might help others in the long run:
Within the database that I am working, there are a number of columns with data type NUMBER() - some of which have set the precision and scale for the number. Now, I do know that precision affects size; however, here is the question: what is the range for the disk size of data type NUMBER in SQL Server in bytes (any measurement is fine, actually).
Some documentation will provide the possible ranges of sizes and the corresponding disk size. If you know of any documentation for this data type, please feel free to post.
OBSERVATION:
I have found documentation for type NUMERIC. Is that the same - or a different version of - NUMBER?

As Andrew has mentioned it is a User Defined Type NUMBER since there is no default data type with name as NUMBER in sql server. And no one here can tell you what characteristics this Data type has.
You can execute the below query to find out what all the characteristics of this User defined Data type.
SELECT *
FROM sys.types
WHERE is_user_defined = 1
AND name = 'NUMBER'

SQL_ID, CBO, Optimizer_Mode and Plan Change History

This is a 4 part question
What is the logic behind SQL_ID. . . Does the value change for the same SQL over time? Does it persists between DB Restarts? Or every plan change gives a new SQL_ID?
How can i check the plan change history for a particular query? Given the SQL_ID i tried querying dba_hist_sqlstat table but it does not give the time of plan change and other details so as to be able to match with the v$sql_plan table.
I have the parameter optimizer_mode set to FIRST_ROWS. Even then when is see the table dba_hist_sqlstat, it indicate ALL_ROWS for some SQL_IDs . . . Can oracle disregard the instance level parameter to use what it deems most suitable?
Between 8PM and 2 PM a query was performing badly. Taking 6 seconds for its execution. After 3 PM the query started responding in < 1 Second. I have the AWR report for the periods that shows this detail. There was no difference in load on the DB in these 2 periods. How could i arrive at the root of this? I am trying to find the History of the plan change but appreciate more feedback to best analyze such issues.
The DB Version is Oracle 10.1.0.4 running on AIX 5.3 64b

1.- SQLID is calculated with a hash function based on the actual SQL text, it shouldnt change with restart or between databases at least the same versions, different oracle versions could have different hashing functions right?, so as long as you do not change the sql text (this includes blanks commas and everything) SQLID will remain the same.
2.- Use DBMS_XPLAN.DISPLAY_AWR to display all plans for a SQL_ID: select * from table(dbms_xplan.display_awr(sql_id => '[your SQL_ID]'));
3.- Oracle will only do this if a query has an OPTIMIZER GOAL hint in it.
4.- There are many things in play for this one. I would start by looking at top 5 timed events in AWR for both periods of time. If they are alike, I would then go investigate the PLAN history for the statement, see if it changed during periods and how data behaved during the periods as well. One of these three should give you the answer.

SQL MIN_ACTIVE_ROWVERSION() value does not change for a long while

We're troubleshooting a sort of Sync Framework between two SQL Server databases, in separate servers (both SQL Server 2008 Enterprise 64 bits SP2 - 10.0.4000.0), through linked server connections, and we reached to a point in which we're sort of stuck.
The logic to identify which are the records "pending to be synced" is of course based on ROWVERSION values, including the use of MIN_ACTIVE_ROWVERSION() to avoid dirty reads.
All SELECT operations are encapsulated in SPs on each "source" side. This is a schematic sample of one SP:
PROCEDURE LoaderRetrieve(#LastStamp bigint, #Rows int)
BEGIN
...
(vars handling)
...
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
Select TOP (#Rows) Field1, Field2, Field3
FROM Table
WHERE [RowVersion] > #LastStampAsRowVersionDataType
AND [RowVersion] < #MinActiveVersion
Order by [RowVersion]
END
The approach works just fine, we usually sync records with the expected rate of 600k/hour (job every 30 seconds, batch size = 5k), but at some point, the sync process does not find any single record to be transferred, even though there are several thousand of records with a ROWVERSION value greater than the #LastStamp parameter.
When checking the reason, we've found that the MIN_ACTIVE_ROWVERSION() has a value less than (or slightly greater, just 5 or 10 increments) the #LastStamp being searched. This of course shouldn't be a problem since the MIN_ACTIVE_ROWVERSION() approach was introduced to avoid dirty reads and posterior issues, BUT:
The problem we see in some occasions, during the above scenario occurs, is that the value for MIN_ACTIVE_ROWVERSION() does not change during a long (really long) period of time, like 30/40 minutes, sometimes more than one hour. And this value is by far less than the ##DBTS value.
We first thought this was related to a pending DB transaction not yet committed. As per MSDN definition about the MIN_ACTIVE_ROWVERSION() (link):
Returns the lowest active rowversion value in the current database. A rowversion value is active if it is used in a transaction that has not yet been committed.
But when checking sessions (sys.sysprocesses) with open_tran > 0 during the duration of this issue, we couldn't find any session with a waittime greater than a few seconds, only one or two occurrences of +/- 5 minutes waittime sessions.
So at this point we're struggling to understand the situation: The MIN_ACTIVE_ROWVERSION() does not change during a huge period of time, and no uncommitted transactions with long waits are found within this time frame.
I'm not a DBA and could be the case that we're missing something in the picture to analyze this problem, doing some research on forums and blogs couldn't found any other clue. So far the open_tran > 0 was the valid reason, but under the circumstances I've exposed, it's clear that there's something else and don't know why.
Any feedback is appreciated.

well, I finally find the solution after digging a bit more.
The problem is that we were looking for sessions with a long waittime, but the real deal was to find sessions which have an active batch since a while.
If there's a session where open_tran = 1, to obtain exactly since when this transaction is open (and of course still active, not yet committed), the field last_batch from sys.sysprocesses shall be checked.
Using this query:
select
batchDurationMin= DATEDIFF(second,last_batch,getutcdate())/60.0,
batchDurationSecs= DATEDIFF(second,last_batch,getutcdate()),
hostname,open_tran,* from sys.sysprocesses a
where spid > 50
and a.open_tran >0
order by last_batch asc
we could identify a session with an open tran being active 30+ minutes. And with hostname values and some more checks within the web services (and also using dbcc inputbuffer) we found the responsible process.
So, the final question actually is "there's indeed an active session with an uncommitted transaction", therefore the MIN_ACTIVE_ROWVERSION() does not change. We were just looking processes with the wrong criteria.
Now that we know which process behaves like this, next step will be to improve it.
Hope this results useful to someone else.

SQL server string manipulation in a view... Or in XSLT

I have been passed a piece of work that I can either do in my application or perhaps in SQL:
I have to get a date out of a string that may look like this:
1234567-DSP-01/01-VER-01/01
or like this:
1234567-VER-01/01-DSP-01/01
but may look like this:
00 12345 DISCH 01/01-VER-01/01 XXX X XXXXX
Yay. if it is a "DSP" then I want that date, if a "DISCH" then that date.
I am pulling the data out in a SQL Server view and would be happy to have the view transform the data for me. My application could do it but would add processor time. I could also see if the data could be manipulated before it is entered into the DB, I suppose.
Thank you for your time.

An option would be to check for the presence of DSP or DISCH then substring out the date as necessary.
For example (I don't have sqlserver today so I can verify syntax, sorry)
select
date = case date_attribute
when charindex('DSP',date_attribute) > 0 then substring(date_attribute,beg,end)
when charindex('DISCH',date_attribute) > 0 then substring(date_attribute,beg,end)
else 'unknown'
end
from myTable

don't store multiple items in the same column!
store the date in its own column when inserting the row!
add a new nullable column for the date
write an update that pulls the date out and sets the new column
alter the column to be not nullable
fix your save routine to pull the date out and insert it in for you

If you do it in the view your adding processing time on SQL which in general a more expensive resource then an app, web or some other type of client.
I'd recommend you try and format the data out when you insert the data, or you handle in the application tier. Scaling horizontally an app tier is so much easier then scalling your SQL.
Edit
I am talking the database server's physical resources are usually more expensive then a properly designed applications server's physical resources. This is because it is very easy to scale an application horizontally, it is in my opinion an order of magnitude more expensive to scale a DB server horizontally. Especially if your dealing with a transactional database and need to manage merging
I am not saying it is not possible just that scaling a database server horizontally is a much more difficult task, hence it's more expensive. The only reason I pointed this out is the OP raised a concern about using CPU cycles on the app server vs the database server. Most applications I have worked with have been data centric applications which processed through GB's of data to get a user an answer. We initially put everything on the database server because it was easier then doing it in classic asp and vb6 at the time. Over time the DB server was more and more loaded until scaling veritcally was no longer an option.
Database Servers are also designed at retrieving and joining data together. You should leave the formating of the data to the application and business rules (in general of course)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas