Sybase: how in/off row LOB impact blocking

Sybase: how in/off row LOB impact blocking - locking

In our sybase ASE 15.7 db, we have a table with 4 columns
uid int
id varchar 32
version varchar 32
xml text
Two indices on uid and id respectively; datarow locking; ~130 row of data
We also have 2k pagesize; xml length min 1012/max 5176/avr 1837. At the moment it's off row LOB
The problem is that sometimes simple insert takes ~10-15 seconds and I'm struggling to understand why.
Can anyone give any theories?
Would in-row LOB help? probably with a size of 2000?
In general how would in-row / off-row affect locking?

I would say you need to test it which sounds obvious but its key for a change such as this. The thing about text columns is that in effect you have a heap of data at the end of the table which can be a focal point for contention and blocking.
That said, in your particular example your average rowsize is pretty close to the maximum size (1962 bytes) for the page size of your Sybase server. You can't set it to 2000 on a 2k page server because that's bigger than your page size is in bytes.
Realistically you would have to set it to 1894 which is your max row size less your other columns, but that's probably a bit close to the maximum, so it depends how full your ID and version columns are.
You also do not specify what type of indexes are in use i.e. clustered or non-clustered because if they're non-clustered it's still a heap table and can be subject to last page chain contention. You also don't quantify your rowcount of the table or any other information as to why your inserts are too slow i.e. transaction volume etc. so consider adding this information to your post such as query plans etc. or whether the ID is sequential as a hotspot for inserts and so on.
In-row LOB can work well for the right data.

Related

SQL Query Performance with an nvarchar(500) where the MAX(LEN(column)) < 30 [duplicate]

I've read up on this on MSDN forums and here and I'm still not clear. I think this is correct: Varchar(max) will be stored as a text datatype, so that has drawbacks. So lets say your field will reliably be under 8000 characters. Like a BusinessName field in my database table. In reality, a business name will probably always be under (pulling a number outta my hat) 500 characters. It seems like plenty of varchar fields that I run across fall well under the 8k character count.
So should I make that field a varchar(500) instead of varchar(8000)? From what I understand of SQL there's no difference between those two. So, to make life easy, I'd want to define all my varchar fields as varchar(8000). Does that have any drawbacks?
Related: Size of varchar columns (I didn't feel like this one answered my question).

One example where this can make a difference is that it can prevent a performance optimization that avoids adding row versioning information to tables with after triggers.
This is covered by Paul White here
The actual size of the data stored is immaterial – it is the potential
size that matters.
Similarly if using memory optimised tables since 2016 it has been possible to use LOB columns or combinations of column widths that could potentially exceed the inrow limit but with a penalty.
(Max) columns are always stored off-row. For other columns, if the data row size in the table definition can exceed 8,060 bytes, SQL Server pushes largest variable-length column(s) off-row. Again, it does not depend on amount of the data you store there.
This can have a large negative effect on memory consumption and performance
Another case where over declaring column widths can make a big difference is if the table will ever be processed using SSIS. The memory allocated for variable length (non BLOB) columns is fixed for each row in an execution tree and is per the columns' declared maximum length which can lead to inefficient usage of memory buffers (example). Whilst the SSIS package developer can declare a smaller column size than the source this analysis is best done up front and enforced there.
Back in the SQL Server engine itself a similar case is that when calculating the memory grant to allocate for SORT operations SQL Server assumes that varchar(x) columns will on average consume x/2 bytes.
If most of your varchar columns are fuller than that this can lead to the sort operations spilling to tempdb.
In your case if your varchar columns are declared as 8000 bytes but actually have contents much less than that your query will be allocated memory that it doesn't require which is obviously inefficient and can lead to waits for memory grants.
This is covered in Part 2 of SQL Workshops Webcast 1 downloadable from here or see below.
use tempdb;
CREATE TABLE T(
id INT IDENTITY(1,1) PRIMARY KEY,
number int,
name8000 VARCHAR(8000),
name500 VARCHAR(500))
INSERT INTO T
(number,name8000,name500)
SELECT number, name, name /*<--Same contents in both cols*/
FROM master..spt_values
SELECT id,name500
FROM T
ORDER BY number
SELECT id,name8000
FROM T
ORDER BY number

From a processing standpoint, it will not make a difference to use varchar(8000) vs varchar(500). It's more of a "good practice" kind of thing to define a maximum length that a field should hold and make your varchar that length. It's something that can be used to assist with data validation. For instance, making a state abbreviation be 2 characters or a postal/zip code as 5 or 9 characters. This used to be a more important distinction for when your data interacted with other systems or user interfaces where field length was critical (e.g. a mainframe flat file dataset), but nowadays I think it's more habit than anything else.

There are some disadvantages to large columns that are a bit less obvious and might catch you a little later:
All columns you use in an INDEX - must not exceed 900 bytes
All the columns in an ORDER BY clause may not exceed 8060 bytes. This is a bit difficult to grasp since this only applies to some columns. See SQL 2008 R2 Row size limit exceeded for details)
If the total row size exceeds 8060 bytes, you get a "page spill" for that row. This might affect performance (A page is an allocation unit in SQLServer and is fixed at 8000 bytes+some overhead. Exceeding this will not be severe, but it's noticable and you should try to avoid it if you easily can)
Many other internal datastructures, buffers and last-not-least your own varaibles and table-variables all need to mirror these sizes. With excessive sizes, excessive memory allocation can affect performance
As a general rule, try to be conservative with the column width. If it becomes a problem, you can easily expand it to fit the needs. If you notice memory issues later, shrinking a wide column later may become impossible without losing data and you won't know where to begin.
In your example of the business names, think about where you get to display them. Is there really space for 500 characters?? If not, there is little point in storing them as such. http://en.wikipedia.org/wiki/List_of_companies_of_the_United_States lists some company names and the max is about 50 characters. So I'd use 100 for the column max. Maybe more like 80.

Apart from best practices (BBlake's answer)
You get warnings about maximum row size (8060) bytes and index width (900 bytes) with DDL
DML will die if you exceed these limits
ANSI PADDING ON is the default so you could end up storing a wholeload of whitespace

Ideally you'd want to go smaller than that, down to a reasonably sized length (500 isn't reasonably sized) and make sure the client validation catches when the data is going to be too large and send a useful error.
While the varchar isn't actually going to reserve space in the database for the unused space, I recall versions of SQL Server having a snit about database rows being wider than some number of bytes (do not recall the exact count) and actually throwing out whatever data didn't fit. A certain number of those bytes were reserved for things internal to SQL Server.

What is the performance impact of exceeding the 8060 bytes per row in SQL Server?

If I create a SQL Server table that has more than 8060 bytes per page, will the querying be considerably hurt for columns that exceed this limit?
I also don't understand quite right if a row occupies the whole 8060 bytes even if it's empty.. in the case it's true, will the query performance be impacted just for the particular rows that exceed the limit or for all rows?

For the first Question:
Yes, it could affect performance. Having a combination of varchar, nvarchar, sql_variant and varbinary in one table, with total size greater than 8,060 bytes results in reallocation data to another page.
While this affects update, I'm not sure if it's too important for reading a data. Internally, SQL Server puts a pointer to the reallocated portion of data within new page so I guess it's quite fast operation.
It's up to you (DBA/developer) to analyze and predict the percentage of such rows in table. If it occurs too often, consider moving large columns into separate table(s).
Use sys.dm_db_index_physical_stats to find out what's going on with your data.
Second Question:
I guess you asked for situation when some columns (especially varchar) are empty. You can "help" SQL Server to save space using sparse columns
Also, I'd recommend this article.

Need to exceed the 8k record limit when using wide columns/sparse tables

Need to exceed the 8k record limit in SQL Server 2008 when using wide columns/sparse tables.
Long story new client old system using survey system, pivoting data so all answers are a column
I have 1500 columns and now I am getting
Cannot create a row that has sparse data of size 9652 which is greater than the allowable maximum sparse data size of 8019.
I need to exceed the 8k record limit if possible

Not possible, because SQL Server stores rows on 8K pages. The only way to do so would be to store some of the data off-row (e.g. using MAX or other LOB types for some of your columns). To your application, this will still look like it's on the same row, even though logically it is on a complete different area of disk.
If your sparse column set alone exceeds the limit, sorry, you'll need to look at a different way to store the data (either not pivoted, EAV, or simply use two tables joined by a key, each containing half of the column set). For the latter you can make this relatively transparent to users by using views and/or enforcing all data access / DML through stored procedures that understand the division.

How big can a SQL Server row be before it's a problem?

Occasionally I run into this limitation using SQL Server 2000 that a row size can not exceed 8K bytes. SQL Server 2000 isn't really state of the art, but it's still in production code and because some tables are denormalized that's a problem.
However, this seems to be a non issue with SQL Server 2005. At least, it won't complain that row sizes are bigger than 8K, but what happens instead and why was this a problem in SQL Server 2000?
Do I need to care about my rows growing? Should I try and avoid large rows? Are varchar(max) and varbinary(max) a solution or expensive, in terms of size in database and/or CPU time?
Why do I care at all about specifying the length of a particular column, when it seems like it's just a matter of time before someones going to hit that upper limit?

Read up on SQL Server 2005 row size limit here:
How Sql Server 2005 bypasses the 8KB row size limitation
SQL Server will split the row data if it's greater than 8K and store the superfluous data into a second data page using a pointer to it in the original one. This will impact performance on queries and joins.

There is still a row size limit - the minimum row size cannot exceed 8060 bytes
CREATE TABLE Table1 (
col1 char(2000),
col2 char(2000),
col3 char(2000),
col4 char(2000),
col5 char(2000)
);
Creating or altering table 'Table1' failed because the minimum row size would be
10007, including 7 bytes of internal overhead. This exceeds the maximum allowable
table row size of 8060 bytes.
When you use varchar(MAX) the strings are not stored inside the row but in another location so in this way you can store more than 8060 bytes. Storing lots of large strings is of course expensive. Just do the calculations and you can see that it will quickly consume large amounts of disk space. But if you do need to store large string then it's OK to do that. The database can handle it.

is there an advantage to varchar(500) over varchar(8000)?

I've read up on this on MSDN forums and here and I'm still not clear. I think this is correct: Varchar(max) will be stored as a text datatype, so that has drawbacks. So lets say your field will reliably be under 8000 characters. Like a BusinessName field in my database table. In reality, a business name will probably always be under (pulling a number outta my hat) 500 characters. It seems like plenty of varchar fields that I run across fall well under the 8k character count.
So should I make that field a varchar(500) instead of varchar(8000)? From what I understand of SQL there's no difference between those two. So, to make life easy, I'd want to define all my varchar fields as varchar(8000). Does that have any drawbacks?
Related: Size of varchar columns (I didn't feel like this one answered my question).

One example where this can make a difference is that it can prevent a performance optimization that avoids adding row versioning information to tables with after triggers.
This is covered by Paul White here
The actual size of the data stored is immaterial – it is the potential
size that matters.
Similarly if using memory optimised tables since 2016 it has been possible to use LOB columns or combinations of column widths that could potentially exceed the inrow limit but with a penalty.
(Max) columns are always stored off-row. For other columns, if the data row size in the table definition can exceed 8,060 bytes, SQL Server pushes largest variable-length column(s) off-row. Again, it does not depend on amount of the data you store there.
This can have a large negative effect on memory consumption and performance
Another case where over declaring column widths can make a big difference is if the table will ever be processed using SSIS. The memory allocated for variable length (non BLOB) columns is fixed for each row in an execution tree and is per the columns' declared maximum length which can lead to inefficient usage of memory buffers (example). Whilst the SSIS package developer can declare a smaller column size than the source this analysis is best done up front and enforced there.
Back in the SQL Server engine itself a similar case is that when calculating the memory grant to allocate for SORT operations SQL Server assumes that varchar(x) columns will on average consume x/2 bytes.
If most of your varchar columns are fuller than that this can lead to the sort operations spilling to tempdb.
In your case if your varchar columns are declared as 8000 bytes but actually have contents much less than that your query will be allocated memory that it doesn't require which is obviously inefficient and can lead to waits for memory grants.
This is covered in Part 2 of SQL Workshops Webcast 1 downloadable from here or see below.
use tempdb;
CREATE TABLE T(
id INT IDENTITY(1,1) PRIMARY KEY,
number int,
name8000 VARCHAR(8000),
name500 VARCHAR(500))
INSERT INTO T
(number,name8000,name500)
SELECT number, name, name /*<--Same contents in both cols*/
FROM master..spt_values
SELECT id,name500
FROM T
ORDER BY number
SELECT id,name8000
FROM T
ORDER BY number

From a processing standpoint, it will not make a difference to use varchar(8000) vs varchar(500). It's more of a "good practice" kind of thing to define a maximum length that a field should hold and make your varchar that length. It's something that can be used to assist with data validation. For instance, making a state abbreviation be 2 characters or a postal/zip code as 5 or 9 characters. This used to be a more important distinction for when your data interacted with other systems or user interfaces where field length was critical (e.g. a mainframe flat file dataset), but nowadays I think it's more habit than anything else.

Apart from best practices (BBlake's answer)
You get warnings about maximum row size (8060) bytes and index width (900 bytes) with DDL
DML will die if you exceed these limits
ANSI PADDING ON is the default so you could end up storing a wholeload of whitespace

Ideally you'd want to go smaller than that, down to a reasonably sized length (500 isn't reasonably sized) and make sure the client validation catches when the data is going to be too large and send a useful error.
While the varchar isn't actually going to reserve space in the database for the unused space, I recall versions of SQL Server having a snit about database rows being wider than some number of bytes (do not recall the exact count) and actually throwing out whatever data didn't fit. A certain number of those bytes were reserved for things internal to SQL Server.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas