nvarchar(max) - how to speed up getting only meaningful string in SQL

nvarchar(max) - how to speed up getting only meaningful string in SQL - sql

I have a table with a column with Nvarchar(Max). The Column is 90% percent of the time having a string length between 255 and 500. Some go well over 22000 which aren't required as its XML of something that the business wont ever use for reporting purpose. Anyways to cut a long story short was the best way to trim out all the excess bulk. I have tried the usual
left(column,500)
and
substring(column,1,500)
I have set the destination column to be 500 length.
However loading the table from source to target destination takes a while just because of that column alone. I am doing the in SSIS in the Source. I also gone to the output column and ignored truncation. Is there anyway I can reduce the time take loading this column. These methods seem to take as much as loading the full length. Any suggestion will be greatly appreciated.

NVARCHAR(MAX) (even when using a function like SUBSTRING or LEFT) will cost a lot of memory and will fill up your buffers quickly. Check the DefaultBufferMaxSize and also the properties BLOBTempStoragePath and BufferTempStoragePath setting them to an optimal value might increase the performance but note that you have so configure them accordingly because it is like a double edged sword.
Also If Source and Destination are on differents servers, the network could also be an issue because all data has to to from your SQL server via the network to your SSIS server. You could try changing the Network Packet Size
More info are provided in these links
Set BLOBTempStoragePath and BufferTempStoragePath to Fast Drives
Troubleshooting Package Performance
Perfomance Issue with NVarchar(MAX) in SSIS

Related

Can i store more than 8000 character in a field of SQL table?

Facing a challenge to store more than 8000 character, but don't wanted to use nvarChar(max) because it will impact on performance.
Is there any way to store character up-to 15000 character in a field without using nvarChar(max)?
And is there any possibility to get increase size of a field dynamically according to data size without use of nvarChar?

Kind of yes, but this gets messy rapidly, and in fact the work around would be worse performance than the varchar(max) option. The issue is the limit of 8060 bytes is permitted on-page for a single row. You can exceed that limit but as long as you accept the data being stored off-page and on a page elsewhere.
Preferred Option : use Varchar(Max) and allow LOB storage to be used.
Alternative : Use multiple varchar(8000) fields, and split / concatenate your string - the data will get Short Large Object'ed (SLOB) and the varchar(8000) fields will be stored off in different pages. This will make it less performant - not to mention the split / concatenate performance issues.
2nd Alternative - compress the data, but there is no guarentee you can still store it within the 8k limit.
Deprecated : Text - do not use this as a solution
In short - do not try to avoid varchar(max) - you would make your life a lot worse.

Change data type in table due to Disk Space/Memory Error

Attempts at changing data type in Access have failed due to error:
"There isn't enough disk space or memory". Over 385,325 records exists in the table.
Attempts at the following links, among other StackOverFlow threads, have failed:
Can't change data type on MS Access 2007
Microsoft Access can't change the datatype. There isn't enough disk space or memory
The intention is to change data type for one column from "Text" to "Number". The aforementioned links cannot accommodate that either due to size or the desired data type fields.
Breaking out the table may not be an option due to the number of records.
Help on this would be appreciated.

I cannot tell for sure about MS Access, but for MS SQL one can avoid a table rebuild (requiring lots of time and space) by appending a new column that allows null- values at the rightmost end of the table, update the column using normal update queries and AFAIK even drop the old column and rename the new one. So in the end it's just the location of that column that has changed.
As for your 385,325 records (I'd expect that number to be correct) even if the table had 1000 columns with 500 unicode- characters each we'd end up with approximately 385,325*1000*500*2 ~ 385 GB of data. That should nowadays not be the maximum available - so:
if it's the disk space you're running out of, how about move the data to some other computer, change the DB there and move it back.
if the DB seems to be corrupted (and standard tools didn't help (make a copy)) it will most probably help to create a new table or database using table creation (better: create manually and append) queries.

NUMBER(p, s) Data Type: Microsoft SQL Server

So, let me start by saying I have had a hard time finding any documentation about this online - hence I am asking, here. I am having to manually calculate the size of a row in Microsoft SQL Server 2008 here at work (I know this can be done via a query; however, do to some hardware issues, it is not presently possible). Either way, I figured this question might help others in the long run:
Within the database that I am working, there are a number of columns with data type NUMBER() - some of which have set the precision and scale for the number. Now, I do know that precision affects size; however, here is the question: what is the range for the disk size of data type NUMBER in SQL Server in bytes (any measurement is fine, actually).
Some documentation will provide the possible ranges of sizes and the corresponding disk size. If you know of any documentation for this data type, please feel free to post.
OBSERVATION:
I have found documentation for type NUMERIC. Is that the same - or a different version of - NUMBER?

As Andrew has mentioned it is a User Defined Type NUMBER since there is no default data type with name as NUMBER in sql server. And no one here can tell you what characteristics this Data type has.
You can execute the below query to find out what all the characteristics of this User defined Data type.
SELECT *
FROM sys.types
WHERE is_user_defined = 1
AND name = 'NUMBER'

Pre-allocate varbinary(max) without actually sending null data to the SQL Server?

I'm storing data in a varbinary(max) column and, for client performance reasons, chunking writes through the ".WRITE()" function using SQL Server 2005. This works great but, due to the side effects, I want to avoid the varbinary column dynamically sizing during each append.
What I'd like to do is optimize this by pre-allocating the varbinary column to the size I want. For example if I'm going to drop 2MB into the column I would like to 'allocate' the column first, then .WRITE the real data using offset/length parameters.
Is there anything in SQL that can help me here? Obviously I don't want to send a null byte array to the SQL server, as this would partially defeat the purpose of the .WRITE optimization.

If you're using a (MAX) data type, then anything above 8K goes into row overflow storage, not the in-page storage. So you just need to put in enough data to get it up to the 8K for the row, making that take up the in-page allocation for the row, and the rest goes into row-overflow storage anyway. There's some more here.
If you want to pre-allocate everything, including the row overflow data, you can use something akin to (example does 10000 bytes):
SELECT CONVERT([varbinary](MAX), REPLICATE(CONVERT(varchar(MAX), '0'), 10000))

First of all kudos to the answer provided - this was a great help! However, there is one slight change that you may want to consider. The code above actually allocates the varbinary field with a converted zero character (hex code 0x30). This may not be what you actually want, particularly if you want to perform binary operations on the field later. What I think is more useful is to allocate the field with a NUL value (hex code 0x00) so that all the bits are turned off by default. To do this, simply make the following correction:
SELECT CONVERT([varbinary](MAX), REPLICATE(CONVERT(varchar(MAX), CHAR(0)), 10000))

SQL server string manipulation in a view... Or in XSLT

I have been passed a piece of work that I can either do in my application or perhaps in SQL:
I have to get a date out of a string that may look like this:
1234567-DSP-01/01-VER-01/01
or like this:
1234567-VER-01/01-DSP-01/01
but may look like this:
00 12345 DISCH 01/01-VER-01/01 XXX X XXXXX
Yay. if it is a "DSP" then I want that date, if a "DISCH" then that date.
I am pulling the data out in a SQL Server view and would be happy to have the view transform the data for me. My application could do it but would add processor time. I could also see if the data could be manipulated before it is entered into the DB, I suppose.
Thank you for your time.

An option would be to check for the presence of DSP or DISCH then substring out the date as necessary.
For example (I don't have sqlserver today so I can verify syntax, sorry)
select
date = case date_attribute
when charindex('DSP',date_attribute) > 0 then substring(date_attribute,beg,end)
when charindex('DISCH',date_attribute) > 0 then substring(date_attribute,beg,end)
else 'unknown'
end
from myTable

don't store multiple items in the same column!
store the date in its own column when inserting the row!
add a new nullable column for the date
write an update that pulls the date out and sets the new column
alter the column to be not nullable
fix your save routine to pull the date out and insert it in for you

If you do it in the view your adding processing time on SQL which in general a more expensive resource then an app, web or some other type of client.
I'd recommend you try and format the data out when you insert the data, or you handle in the application tier. Scaling horizontally an app tier is so much easier then scalling your SQL.
Edit
I am talking the database server's physical resources are usually more expensive then a properly designed applications server's physical resources. This is because it is very easy to scale an application horizontally, it is in my opinion an order of magnitude more expensive to scale a DB server horizontally. Especially if your dealing with a transactional database and need to manage merging
I am not saying it is not possible just that scaling a database server horizontally is a much more difficult task, hence it's more expensive. The only reason I pointed this out is the OP raised a concern about using CPU cycles on the app server vs the database server. Most applications I have worked with have been data centric applications which processed through GB's of data to get a user an answer. We initially put everything on the database server because it was easier then doing it in classic asp and vb6 at the time. Over time the DB server was more and more loaded until scaling veritcally was no longer an option.
Database Servers are also designed at retrieving and joining data together. You should leave the formating of the data to the application and business rules (in general of course)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas