Retrieving MS SQL database size in culture independent format - sql

sp_helpdb returns strings like '50000.255 MB' in the db_size column.
These strings are culture-dependent; the above string will mean 2 different things in US and Germany (in the latter, the dot char is used as a group separator, similar to the comma in US).
Is there another method which returns a numeric value, culture-independent?

Use YourDB;
SELECT SUM(Size / 128.0) As FileSize from sys.database_files;
This returns the size in MB as a numeric, you should be able to do what you like with it from there.
Note: size returns the number of 8KB pages in a given database file.
http://msdn.microsoft.com/en-us/library/ms174397.aspx

I do not think it is possible. The SP sp_helpdb uses str to convert the numeric size to varchar and there is nothing in the documentation (that I can find) that can make str use , instead of . as decimal symbol. Using set language does not help.
Workaround as suggested by Martin in comment
select replace(str(sum(convert(dec(17,2),size)) / 128,10,2) +' MB', '.', ',')
from sys.database_files

Related

LTRIM RTRIM not working for Chinese string SQL

I have a column named Text which receives from the end user the following string:
'复合模头滤网 φ245 120目*300目 24×120目 '
Which includes a weird space, kind of larger than the regular space, at the end. However it has the same ASCII code as the normal space 32.
I used this SQL code to trim my string but always returning same string without trimming !!!!
LTRIM(RTRIM([Text]))
The solution is to try trim the the character with the ASCII code 32. The following code works perfectly:
TRIM(CHAR(32) from [ShortText])
To check it out if works , I tried it this way :
DECLARE #t TABLE(txt nvarchar(255));
INSERT INTO #t VALUES (TRIM(CHAR(32) from '复合模头滤网 φ245 120目*300目 24×120目 '));
SELECT txt, LEN((txt)), ASCII(RIGHT(txt,1)) AS ASCII_Char
--32=SPACE,--13 CR,--11 LF, 9-tab
FROM #t
This character is U+3000 IDEOGRAPHIC SPACE, and as documented, SQL Server by default only removes U+0020 SPACE.
You can use TRIM(... FROM in modern versions of SQL Server
DECLARE #t nvarchar(1000) = N'复合模头滤网 φ245 120目*300目 24×120目 ';
SELECT
DATALENGTH(#t) / 2 totalCharacters,
LEN(#t) totalCharactersTrimmed,
TRIM(#t) trimmedNormal,
DATALENGTH(TRIM(#t)) / 2 totaTrimmedNormal,
TRIM(NCHAR(0x3000) FROM #t) trimmedIdeographic,
TRIM(N' ' FROM #t) trimmedIdeographic,
DATALENGTH(TRIM(NCHAR(0x3000) FROM #t)) / 2 totalTrimmedIdeographic;
SELECT
UNICODE(NCHAR(0x3000)) unicodeNum,
ASCII(NCHAR(0x3000)) asciiNum;
db<>fiddle
You claim it has the same ASCII code, however that is just because ASCII does not have an exact character for it. If you use the UNICODE function, you will see the difference, as the fiddle shows.
For such characters as these, you must make sure to use the nvarchar data type, and the NCHAR and UNICODE functions.

Printing out box drawing characters (extended-ascii) in SSMS

I want to print box-drawing character in Output messages in SSMS. It includes characters like e.g. ║ or ░ or ╬.
The full list of characters which I have in my mind can be found here.
When I am trying the following: PRINT '╬' it returns simply + while I am expecting ╬.
When I am executing SELECT ASCII('╬') it returns 43, but when I am executing SELECT CHAR(43) it returns (not surprisingly) +.
Is it related to collation? If so, how can I find which collation to use?
A simple literal in SQL-Server is - by default a CHAR / VARCHAR type. This type is 1-byte-encoded extended ASCII: The lower half is the plain latin character set, the upper half is depending on a collation. This means, there is very little support for non-standard characters.
The second character type is NCHAR / NVARCHAR. This is (almost) unicode, very close to utf-16. The actual encoding is two-byte encoded UCS-2. The support for non-standard characters is (almost) complete. Any literal starting with a N is treated as NCHAR / NVARCHAR:
Try this:
SELECT '╬',N'╬';
DECLARE #str1a VARCHAR(10)='╬';
DECLARE #str1b VARCHAR(10)=N'╬'; --The NVARCHAR literal is changed to VARCHAR
DECLARE #str2 NVARCHAR(10)=N'╬';
SELECT #str1a,#str1b,#str2;
The functions to get the code point and - vice versa - to get the character are two-folded too:
SELECT ASCII('a'), UNICODE(N'a')
,ASCII('╬'), UNICODE(N'╬')
,CHAR(97),NCHAR(97),CHAR(43),NCHAR(43)
,NCHAR(9580)--does not work with `CHAR`
You need to print then in Unicode, i.e. to prefix them with N:
PRINT N'╬'

Min length constraint preventing from inserting spaces into column [duplicate]

I have the following test table in SQL Server 2005:
CREATE TABLE [dbo].[TestTable]
(
[ID] [int] NOT NULL,
[TestField] [varchar](100) NOT NULL
)
Populated with:
INSERT INTO TestTable (ID, TestField) VALUES (1, 'A value'); -- Len = 7
INSERT INTO TestTable (ID, TestField) VALUES (2, 'Another value '); -- Len = 13 + 6 spaces
When I try to find the length of TestField with the SQL Server LEN() function it does not count the trailing spaces - e.g.:
-- Note: Also results the grid view of TestField do not show trailing spaces (SQL Server 2005).
SELECT
ID,
TestField,
LEN(TestField) As LenOfTestField, -- Does not include trailing spaces
FROM
TestTable
How do I include the trailing spaces in the length result?
This is clearly documented by Microsoft in MSDN at http://msdn.microsoft.com/en-us/library/ms190329(SQL.90).aspx, which states LEN "returns the number of characters of the specified string expression, excluding trailing blanks". It is, however, an easy detail on to miss if you're not wary.
You need to instead use the DATALENGTH function - see http://msdn.microsoft.com/en-us/library/ms173486(SQL.90).aspx - which "returns the number of bytes used to represent any expression".
Example:
SELECT
ID,
TestField,
LEN(TestField) As LenOfTestField, -- Does not include trailing spaces
DATALENGTH(TestField) As DataLengthOfTestField -- Shows the true length of data, including trailing spaces.
FROM
TestTable
You can use this trick:
LEN(Str + 'x') - 1
I use this method:
LEN(REPLACE(TestField, ' ', '.'))
I prefer this over DATALENGTH because this works with different data types, and I prefer it over adding a character to the end because you don't have to worry about the edge case where your string is already at the max length.
Note: I would test the performance before using it against a very large data set; though I just tested it against 2M rows and it was no slower than LEN without the REPLACE...
"How do I include the trailing spaces in the length result?"
You get someone to file a SQL Server enhancement request/bug report because nearly all the listed workarounds to this amazingly simple issue here have some deficiency or are inefficient. This still appears to be true in SQL Server 2012. The auto trimming feature may stem from ANSI/ISO SQL-92 but there seems to be some holes (or lack of counting them).
Please vote up "Add setting so LEN counts trailing whitespace" here:
https://feedback.azure.com/forums/908035-sql-server/suggestions/34673914-add-setting-so-len-counts-trailing-whitespace
Retired Connect link:
https://connect.microsoft.com/SQLServer/feedback/details/801381
There are problems with the two top voted answers. The answer recommending DATALENGTH is prone to programmer errors. The result of DATALENGTH must be divided by the 2 for NVARCHAR types, but not for VARCHAR types. This requires knowledge of the type you're getting the length of, and if that type changes, you have to diligently change the places you used DATALENGTH.
There is also a problem with the most upvoted answer (which I admit was my preferred way to do it until this problem bit me). If the thing you are getting the length of is of type NVARCHAR(4000), and it actually contains a string of 4000 characters, SQL will ignore the appended character rather than implicitly cast the result to NVARCHAR(MAX). The end result is an incorrect length. The same thing will happen with VARCHAR(8000).
What I've found works, is nearly as fast as plain old LEN, is faster than LEN(#s + 'x') - 1 for large strings, and does not assume the underlying character width is the following:
DATALENGTH(#s) / DATALENGTH(LEFT(LEFT(#s, 1) + 'x', 1))
This gets the datalength, and then divides by the datalength of a single character from the string. The append of 'x' covers the case where the string is empty (which would give a divide by zero in that case). This works whether #s is VARCHAR or NVARCHAR. Doing the LEFT of 1 character before the append shaves some time when the string is large. The problem with this though, is that it does not work correctly with strings containing surrogate pairs.
There is another way mentioned in a comment to the accepted answer, using REPLACE(#s,' ','x'). That technique gives the correct answer, but is a couple orders of magnitude slower than the other techniques when the string is large.
Given the problems introduced by surrogate pairs on any technique that uses DATALENGTH, I think the safest method that gives correct answers that I know of is the following:
LEN(CONVERT(NVARCHAR(MAX), #s) + 'x') - 1
This is faster than the REPLACE technique, and much faster with longer strings. Basically this technique is the LEN(#s + 'x') - 1 technique, but with protection for the edge case where the string has a length of 4000 (for nvarchar) or 8000 (for varchar), so that the correct answer is given even for that. It also should handle strings with surrogate pairs correctly.
LEN cuts trailing spaces by default, so I found this worked as you move them to the front
(LEN(REVERSE(TestField))
So if you wanted to, you could say
SELECT
t.TestField,
LEN(REVERSE(t.TestField)) AS [Reverse],
LEN(t.TestField) AS [Count]
FROM TestTable t
WHERE LEN(REVERSE(t.TestField)) <> LEN(t.TestField)
Don't use this for leading spaces of course.
You need also to ensure that your data is actually saved with the trailing blanks. When ANSI PADDING is OFF (non-default):
Trailing blanks in character values
inserted into a varchar column are
trimmed.
You should define a CLR function that returns the String's Length field, if you dislike string concatination.
I use LEN('x' + #string + 'x') - 2 in my production use-cases.
If you dislike the DATALENGTH because of of n/varchar concerns, how about:
select DATALENGTH(#var)/isnull(nullif(DATALENGTH(left(#var,1)),0),1)
which is just
select DATALENGTH(#var)/DATALENGTH(left(#var,1))
wrapped with divide-by-zero protection.
By dividing by the DATALENGTH of a single char, we get the length normalised.
(Of course, still issues with surrogate-pairs if that's a concern.)
This is the best algorithm I've come up with which copes with the maximum length and variable byte count per character issues:
ISNULL(LEN(STUFF(#Input, 1, 1, '') + '.'), 0)
This is a variant of the LEN(#Input + '.') - 1 algorithm but by using STUFF to remove the first character we ensure that the modified string doesn't exceed maximum length and remove the need to subtract 1.
ISNULL(..., 0) is added to deal with the case where #Input = '' which causes STUFF to return NULL.
This does have the side effect that the result is also 0 when #Input is NULL which is inconsistent with LEN(NULL) which returns NULL, but this could be dealt with by logic outside this function if need be
Here are the results using LEN(#Input), LEN(#Input + '.') - 1, LEN(REPLACE(#Input, ' ', '.')) and the above STUFF variant, using a sample of #Input = CAST(' S' + SPACE(3998) AS NVARCHAR(4000)) over 1000 iterations
Algorithm
DataLength
ExpectedResult
Result
ms
LEN
8000
4000
2
14
+DOT-1
8000
4000
1
13
REPLACE
8000
4000
4000
514
STUFF+DOT
8000
4000
4000
0
In this case the STUFF algorithm is actually faster than LEN()!
I can only assume that internally SQL looks at the last character and if it is not a space then optimizes the calculation
But that's a good result eh?
Don't use the REPLACE option unless you know your strings are small - it's hugely inefficient
use
SELECT DATALENGTH('string ')

Trimmining a column with bad data

My data looks like
ID LPNumber
1 30;#TEST123
2 302;#TEST1232
How can I update MyText to drop everything before the # and including the #, so I'm left with the following:
ID LPNumber
1 TEST123
2 TEST1232
I've looked at SQL Server Replace, but can't think of a viable way of checking for the ";"
On the MSDN REPLACE page, the menu on the left gives the complete list of string functions available.
UPDATE
MyTable
SET
LPNumber = SUBSTRING(LPNumber, CHARINDEX('#', LPNumber)+1, 8000);
I'll let you work out (from MSDN) the filter needed in case there is no # in the column...
Edit:
Why 8000?
The longest non-LOB string length is 8000 so it is shorthand for "until end of string". You can use 2147483647 too for max columns or to make it consistent.
Also, LEN can bollix you.
SET ANSI_PADDING is ON by default
LEN ignores trailing spaces
You'd need to use DATALENGTH but then you need to know the data type because this counts bytes, not characters. See https://stackoverflow.com/a/2557843/27535 for example
So using a magic number is perhaps a lesser evil...
Use CHARINDEX(), LEN() and RIGHT() instead.
RIGHT(LPNumber, LEN(LPNumber) - CHARINDEX('#', LPNumber, 0))

How to extract numerical data from SQL result

Suppose there is a table "A" with 2 columns - ID (INT), DATA (VARCHAR(100)).
Executing "SELECT DATA FROM A" results in a table looks like:
DATA
---------------------
Nowshak 7,485 m
Maja e Korabit (Golem Korab) 2,764 m
Tahat 3,003 m
Morro de Moco 2,620 m
Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)
Mount Kosciuszko 2,229 m
Grossglockner 3,798 m
// the DATA continues...
---------------------
How can I extract only the numerical data using some kind of string processing function in the SELECT SQL query so that the result from a modified SELECT would look like this:
DATA (in INTEGER - not varchar)
---------------------
7485
2764
3003
2620
6960
2229
3798
// the DATA in INTEGER continues...
---------------------
By the way, it would be best if this could be done in a single SQL statement. (I am using IBM DB2 version 9.5)
Thanks :)
I know this thread is old, and the OP doesn't need the answer, but I had to figure this out with a few hints from this and other threads. They all seem to be missing the exact answer.
The easy way to do this is to TRANSLATE all unneeded characters to a single character, then REPLACE that single character with an empty string.
DATA = 'Nowshak 7,485 m'
# removes all characters, leaving only numbers
REPLACE(TRANSLATE(TRIM(DATA), '_____________________________________________________________________________________________', ' abcdefghijklmnopqrstuvwzyaABCDEFGHIJKLMNOPQRSTUVWXYZ`~!##$%^&*()-_=+\|[]{};:",.<>/?'), '_', '')
=> '7485'
To break down the TRANSLATE command:
TRANSLATE( FIELD or String, <to characters>, <from characters> )
e.g.
DATA = 'Sample by John'
TRANSLATE(DATA, 'XYZ', 'abc')
=> a becomes X, b becomes Y, c becomes Z
=> 'SXmple Yy John'
** Note: I can't speak to performance or version compatibility. I'm on a 9.x version of DB2, and new to the technology. Hope this helps someone.
In Oracle:
SELECT TO_NUMBER(REGEXP_REPLACE(data, '[^0-9]', ''))
FROM a
In PostgreSQL:
SELECT CAST(REGEXP_REPLACE(data, '[^0-9]', '', 'g') AS INTEGER)
FROM a
In MS SQL Server and DB2, you'll need to create UDF's for regular expressions and query like this.
See links for more details.
Doing a quick search on line for DB2 the best inbuilt function I can find is Translate It lets you specify a list of characters you want to change to other characters. It's not ideal, but you can specify every character that you want to strip out, that is, every non numeric character available...
(Yes, that's a long list, a very long list, which is why I say it's not ideal)
TRANSLATE('data', 'abc...XYZ,./\<>?|[and so on]', ' ')
Alternatively you need to create a user defined function to search for the number. There are a few alternatives for that.
Check each character one by one and keep it only if it's a numeric.
If you know what precedes the number and what follows the number, you can search for those and keep what is in between...
To elaborate on Dems's suggeston, the approach I've used is a scalar user-defined function (UDF) that accepts an alphanumeric string and recursively iterates through the string (one byte per iteration) and suppresses the non-numeric characters from the output. The recursive expression will generate a row per iteration, but only the final row is kept and returned to the calling application.