How to check if a string is a uniqueidentifier? - sql

Is there an equivalent to IsDate or IsNumeric for uniqueidentifier (SQL Server)?
Or is there anything equivalent to (C#) TryParse?
Otherwise I'll have to write my own function, but I want to make sure I'm not reinventing the wheel.
The scenario I'm trying to cover is the following:
SELECT something FROM table WHERE IsUniqueidentifier(column) = 1

SQL Server 2012 makes this all much easier with TRY_CONVERT(UNIQUEIDENTIFIER, expression)
SELECT something
FROM your_table
WHERE TRY_CONVERT(UNIQUEIDENTIFIER, your_column) IS NOT NULL;
For prior versions of SQL Server, the existing answers miss a few points that mean they may either not match strings that SQL Server will in fact cast to UNIQUEIDENTIFIER without complaint or may still end up causing invalid cast errors.
SQL Server accepts GUIDs either wrapped in {} or without this.
Additionally it ignores extraneous characters at the end of the string. Both SELECT CAST('{5D944516-98E6-44C5-849F-9C277833C01B}ssssssssss' as uniqueidentifier) and SELECT CAST('5D944516-98E6-44C5-849F-9C277833C01BXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' as uniqueidentifier) succeed for instance.
Under most default collations the LIKE '[a-zA-Z0-9]' will end up matching characters such as À or Ë
Finally if casting rows in a result to uniqueidentifier it is important to put the cast attempt in a case expression as the cast may occur before the rows are filtered by the WHERE.
So (borrowing #r0d30b0y's idea) a slightly more robust version might be
;WITH T(C)
AS (SELECT '5D944516-98E6-44C5-849F-9C277833C01B'
UNION ALL
SELECT '{5D944516-98E6-44C5-849F-9C277833C01B}'
UNION ALL
SELECT '5D944516-98E6-44C5-849F-9C277833C01BXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
UNION ALL
SELECT '{5D944516-98E6-44C5-849F-9C277833C01B}ssssssssss'
UNION ALL
SELECT 'ÀD944516-98E6-44C5-849F-9C277833C01B'
UNION ALL
SELECT 'fish')
SELECT CASE
WHEN C LIKE expression + '%'
OR C LIKE '{' + expression + '}%' THEN CAST(C AS UNIQUEIDENTIFIER)
END
FROM T
CROSS APPLY (SELECT REPLACE('00000000-0000-0000-0000-000000000000', '0', '[0-9a-fA-F]') COLLATE Latin1_General_BIN) C2(expression)
WHERE C LIKE expression + '%'
OR C LIKE '{' + expression + '}%'

Not mine, found this online... thought i'd share.
SELECT 1 WHERE #StringToCompare LIKE
REPLACE('00000000-0000-0000-0000-000000000000', '0', '[0-9a-fA-F]');

SELECT something
FROM table1
WHERE column1 LIKE '[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]-[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]-[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]-[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]-[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]';
UPDATE:
...but I much prefer the approach in the answer by #r0d30b0y:
SELECT something
FROM table1
WHERE column1 LIKE REPLACE('00000000-0000-0000-0000-000000000000', '0', '[0-9a-fA-F]');

I am not aware of anything that you could use "out of the box" - you'll have to write this on your own, I'm afraid.
If you can: try to write this inside a C# library and deploy it into SQL Server as a SQL-CLR assembly - then you could use things like Guid.TryParse() which is certainly much easier to use than anything in T-SQL....

A variant of r0d30b0y answer is to use PATINDEX to find within a string...
PATINDEX('%'+REPLACE('00000000-0000-0000-0000-000000000000', '0', '[0-9a-fA-F]')+'%',#StringToCompare) > 0
Had to use to find Guids within a URL string..
HTH
Dave

Like to keep it simple. A GUID has four - in it even, if is just a string
WHERE column like '%-%-%-%-%'

Though an older post, just a thought for a quick test ...
SELECT [A].[INPUT],
CAST([A].[INPUT] AS [UNIQUEIDENTIFIER])
FROM (
SELECT '5D944516-98E6-44C5-849F-9C277833C01B' Collate Latin1_General_100_BIN AS [INPUT]
UNION ALL
SELECT '{5D944516-98E6-44C5-849F-9C277833C01B}'
UNION ALL
SELECT '5D944516-98E6-44C5-849F-9C277833C01BXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
UNION ALL
SELECT '{5D944516-98E6-44C5-849F-9C277833C01B}ssssssssss'
UNION ALL
SELECT 'ÀD944516-98E6-44C5-849F-9C277833C01B'
UNION ALL
SELECT 'fish'
) [A]
WHERE PATINDEX('[^0-9A-F-{}]%', [A].[INPUT]) = 0

This is a function based on the concept of some earlier comments. This function is very fast.
CREATE FUNCTION [dbo].[IsGuid] (#input varchar(50))
RETURNS bit AS
BEGIN
RETURN
case when #input like '[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]-[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]-[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]-[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]-[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]'
then 1 else 0 end
END
GO
/*
Usage:
select [dbo].[IsGuid]('123') -- Returns 0
select [dbo].[IsGuid]('ebd8aebd-7ea3-439d-a7bc-e009dee0eae0') -- Returns 1
select * from SomeTable where dbo.IsGuid(TableField) = 0 -- Returns table with all non convertable items!
*/

DECLARE #guid_string nvarchar(256) = 'ACE79678-61D1-46E6-93EC-893AD559CC78'
SELECT
CASE WHEN #guid_string LIKE '________-____-____-____-____________'
THEN CONVERT(uniqueidentifier, #guid_string)
ELSE NULL
END

You can write your own UDF. This is a simple approximation to avoid the use of a SQL-CLR assembly.
CREATE FUNCTION dbo.isuniqueidentifier (#ui varchar(50))
RETURNS bit AS
BEGIN
RETURN case when
substring(#ui,9,1)='-' and
substring(#ui,14,1)='-' and
substring(#ui,19,1)='-' and
substring(#ui,24,1)='-' and
len(#ui) = 36 then 1 else 0 end
END
GO
You can then improve it to check if it´s just about HEX values.

I use :
ISNULL(convert(nvarchar(50), userID), 'NULL') = 'NULL'

I had some Test users that were generated with AutoFixture, which uses GUIDs by default for generated fields. My FirstName fields for the users that I need to delete are GUIDs or uniqueidentifiers. That's how I ended up here.
I was able to cobble together some of your answers into this.
SELECT UserId FROM [Membership].[UserInfo] Where TRY_CONVERT(uniqueidentifier, FirstName) is not null

Use RLIKE for MYSQL
SELECT 1 WHERE #StringToCompare
RLIKE REPLACE('00000000-0000-0000-0000-000000000000', '0', '[0-9a-fA-F]');

In a simplest scenario. When you sure that given string can`t contain 4 '-' signs.
SELECT * FROM City WHERE Name LIKE('%-%-%-%-%')

In BigQuery you can use
SELECT *
FROM table
WHERE
REGEXP_CONTAINS(uuid, REPLACE('^00000000-0000-0000-0000-000000000000$', '0', '[0-9a-fA-F]'))

Related

mssql select all nvarchar with wrong encoding

I am working with a old database where someone didn't encode the data the right way before inserting it into the database. which result in text like
"Wrong t�xt" (in my case the '�' is a ø).
I am looking for a way to find all rows where the column contains data like this, so i can correct it.
So far i tried using regex like
SELECT * FROM table WHERE ([colm] not like '[a-zA-Z\s]%')
but no matter what i do, i can't find a way to select only the ones containing the '�'
a search like
SELECT * FROM table WHERE ([colm] like '%�%')
won't return anything either. (tried it, just in cases).
I been search for this on Google and here on Stackoverflow, but either there is no one having this problem, or I am searching for the wrong thing.
So if someone would be so kind to help me with this, I would be really happy.
Thanks for your time.
Assuming the character in the string really is U+FFFD REPLACEMENT CHARACTER (�), and it's not displayed as a replacement character because there are actually other bytes in there that can't be decoded properly, you can find it with
SELECT * FROM table WHERE [colm] LIKE N'%�%' COLLATE Latin1_General_BIN2
Or (to avoid any further issues with encoding mangling characters)
SELECT * FROM table WHERE [colm] LIKE N'%' + NCHAR(0xfffd) + N'%' COLLATE Latin1_General_BIN2
Unicode is required because � does not exist in any single-byte collation, and a binary collation is required because the regular collations treat � as if it did not occur in strings at all.
Try this:
WHERE [colm] not like N'%[a-zA-Z]%'
Of course, this should return values with numbers, spaces, and punctuation.
As Jeroen mentioned, using a binary seems to be the way to go. Personally I would suggest using NGrams4k here, but I built a quick tally table instead that does the job:
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3, N N4)
SELECT V.Colm
FROM (VALUES(N'Wrong t�xt" (in my case the ''�'' is a ø)'),
(N'This string is ok'))V(colm)
JOIN Tally T ON LEN(V.Colm) >= T.I
CROSS APPLY (VALUES(SUBSTRING(V.Colm,T.I,1))) SS(C)
GROUP BY V.colm
HAVING COUNT(CASE CONVERT(binary(2),SS.C) WHEN 0xFDFF THEN 1 END) > 0;
You could replace occurences of the U+FFFD REPLACEMENT CHARACTER (�) and compare it with the original value:
SELECT *
, CASE WHEN CONVERT(VARBINARY(MAX), t.colm) = CAST(REPLACE(CONVERT(VARBINARY(MAX), t.colm), 0xFDFF, 0x) AS VARBINARY(MAX)) THEN 1 ELSE 0 END AS EncodingCorrect
FROM (
SELECT N'Wrong t�xt" (in my case the ''�'' is a ø)' AS colm
UNION ALL
SELECT 'Correct text'
UNION ALL
SELECT 'Wrong t?xt" (in my case the ''?'' is a ø)'
) t
#Jeroen Mostert's suggestion WHERE colm LIKE N'%�%' COLLATE Latin1_General_BIN2 seems like the better and more readable solution.

Determine if zip code contains numbers only

I have a field called zip, type char(5), which contains zip codes like
12345
54321
ABCDE
I'd like to check with an sql statement if a zip code contains numbers only.
The following isn't working
SELECT * FROM S1234.PERSON
WHERE ZIP NOT LIKE '%'
It can't work because even '12345' is an "array" of characters (it is '%', right?
I found out that the following is working:
SELECT * FROM S1234.PERSON
WHERE ZIP NOT LIKE ' %'
It has a space before %. Why is this working?
If you use SQL Server 2012 or up the following script should work.
DECLARE #t TABLE (Zip VARCHAR(10))
INSERT INTO #t VALUES ('12345')
INSERT INTO #t VALUES ('54321')
INSERT INTO #t VALUES ('ABCDE')
SELECT *
FROM #t AS t
WHERE TRY_CAST(Zip AS NUMERIC) IS NOT NULL
Using answer from here to check if all are digit
SELECT col1,col2
FROM
(
SELECT col1,col2,
CASE
WHEN LENGTH(RTRIM(TRANSLATE(ZIP , '*', ' 0123456789'))) = 0
THEN 0 ELSE 1
END as IsAllDigit
FROM S1234.PERSON
) AS Z
WHERE IsAllDigit=0
DB2 doesnot have regular expression facility like MySQL REGEXP
USE ISNUMERIC function;
ISUMERIC returns 1 if the parameter contains only numbers and zero if it not
EXAMPLE:
SELECT * FROM S1234.PERSON
WHERE ISNUMERIC(ZIP) = 1
Your statement doesn't validate against numbers but it says get everything that doesn't start with a space.
Let's suppose you ZIP code is a USA zip code, composed by 5 numbers.
db2 "with val as (
select *
from S1234.PERSON t
where xmlcast(xmlquery('fn:matches(\$ZIP,''^\d{5}$'')') as integer) = 1
)
select * from val"
For more information about xQuery:fn:matches: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.xml.doc/doc/xqrfnmat.html
mySql does not have a native isNumberic() function. This would be pretty straight-forward in Excel with the ISNUMBER() function, or in T-SQL with ISNUMERIC(), but neither work in MySQL so after a little searching around I came across this solution...
SELECT * FROM S1234.PERSON
WHERE ZIP REGEXP ('[0-9]')
Effectively we're processing a regular expression on the contents of the 'ZIP' field, it may seem like using a sledgehammer to crack a nut and I've no idea how performance would differ from a more simple approach but it worked and I guess that's the point.
I have made more error-prone version based on the solution https://stackoverflow.com/a/36211270/565525, added intermedia result, some examples:
select
test_str
, TRIM(TRANSLATE(replace(trim(test_str), ' ', 'x'), 'yyyyyyyyyyy', '0123456789'))
, case when length(TRIM(TRANSLATE(replace(trim(test_str), ' ', 'x'), 'yyyyyyyyyyy', '0123456789')))=5 then '5-digit-zip' else 'not 5d-zip' end is_zip
from (VALUES
(' 123 ' )
,(' abc ' )
,(' a12 ' )
,(' 12 3 ')
,(' 99435 ')
,('99323' )
) AS X(test_str)
;
The result for this example set is:
TEST_STR 2 IS_ZIP
-------- -------- -----------
123 yyy not 5d-zip
abc abc not 5d-zip
a12 ayy not 5d-zip
12 3 yyxy not 5d-zip
99435 yyyyy 5-digit-zip
99323 yyyyy 5-digit-zip
Try checking if there's a difference between lower case and upper case. Numerics and special chars will look the same:
SELECT *
FROM S1234.PERSON
WHERE UPPER(ZIP COLLATE Latin1_General_CS_AI ) = LOWER(ZIP COLLATE Latin1_General_CS_AI)
Here's a working example for the case where you'd want to check zip codes in a range. You could use this code for inspiration to make a simple single post code check, if you want:
if local_test_environment?
# SQLite supports GLOB which is similar to LIKE (which it only has limited support for), for matching in strings.
where("(zip_code NOT GLOB '*[^0-9]*' AND zip_code <> '') AND (CAST(zip_code AS int) >= :range_start AND CAST(zip_code AS int) <= :range_finish)", range_start: range_start, range_finish: range_finish)
else
# SQLServer supports LIKE with more advanced matching in strings than what SQLite supports.
# SQLServer supports TRY_PARSE which is non-standard SQL, but fixes the error SQLServer gives with CAST, namely: Conversion failed when converting the nvarchar value 'US-19803' to data type int.
where("(zip_code NOT LIKE '%[^0-9]%' AND zip_code <> '') AND (TRY_PARSE(zip_code AS int) >= :range_start AND TRY_PARSE(zip_code AS int) <= :range_finish)", range_start: range_start, range_finish: range_finish)
end
Use regex.
SELECT * FROM S1234.PERSON
WHERE ZIP REGEXP '\d+'

T-SQL Split String Like Clause

I have declare #a varchar(100) = 'abc bcd cde def'. What I need is to select from a table where a column is like 'abc' or 'bcd' or 'cde' or 'def'. I can use a split function and a while to get what I want, but somewhere I saw a smart solution using replace or something similar and I just can't remember it.
I know I can use an xml variable, and parse it that way. However, the value is part of a large procedure, and the best way for me is to use it in string form.
I know I can solve this by building a dynamic sql query, but that is not an option in the domain I'm working in.
Damn, I just can remember the solution. Its a hack, a little dirty trick that do the job.
Anyways, I ll use the code bellow (Im over SQL Server 2008), is it a good idea? I prefer it over the dirty split. Is it more performatic?
declare #w varchar(100) = 'some word'
declare #f xml
set #f = '<word>' + replace(#w, ' ', '</word><word>') + '</word>'
select
template.item.value('.', 'varchar(100)') as word
from #f.nodes('/word') template(item)
Use a function to split the individual items into a table, one record per item. Then you simply join to that table.
insert into #FilterTable (filters)
select Items from dbo.Split(#YourFilterString)
select *
from YourTable yt
join #FilterTable f on f.filters = yt.YourColumn
Of course my example is using equality. It gets more complicated if you truly intend to use "like" with wildcards.
In tsql you can use a pattern col like '[abcd]'
http://msdn.microsoft.com/en-us/library/ms179859.aspx
For matching multiple words (not letter) and without dynamic SQL, you'll have to get the values into a temp table. For a split function try this page http://www.sommarskog.se/arrays-in-sql-2005.html#iterative and look at the List of Strings function iter_charlist_to_table.
Or maybe you are thinking of this little trick Parameterize an SQL IN clause from the SO CEO.
for 4 sections max
WHERE
PARSENAME(REPLACE(#a, ' ', '.'), 1) = 'xxx'
OR
PARSENAME(REPLACE(#a, ' ', '.'), 2) = 'xxx'
OR
PARSENAME(REPLACE(#a, ' ', '.'), 3) = 'xxx'
OR
PARSENAME(REPLACE(#a, ' ', '.'), 4) = 'xxx'

A small Help needed in a _Sql Converstion of Decimal_ using **CASE**

I am need to convert a value in to decimal.Before that I am checking a condition.I want to eliminate the decimal values if #tbt=1.
Eg if #tbt=1 then 15
if #tbt=0 then 15.233
declare #tbt int =1
1) select
CASE WHEN #tbt=1 THEN CONVERT(DECIMAL(24,0),15.23335)
ELSE CONVERT(DECIMAL(24,3),15.23335) END
2) select
CASE WHEN #tbt=1 THEN '1'
ELSE '2' END
The first Query will returns 15.000.
1. Is it possible to get 15?
2. If CONVERT(DECIMAL(24,0),15.23335) returns 15.then why it is coming 15.000 in the query.
For checking I used another query and it will prints 2.
Thanks
you could use your current solution and add additional cast to Varchar(30) on both.
You can't force it to return 2 separate datatypes like this depending on the CASE.
If you insert that result into a table using SELECT INTO syntax, you'll actually see the datatype is not DECIMAL(24,0) but DECIMAL(27,3)
i.e.
declare #tbt int =1
select
CASE WHEN #tbt=1 THEN CONVERT(DECIMAL(24,0),15.23335)
ELSE CONVERT(DECIMAL(24,3),15.23335) END AS Col
INTO SomeTestTable
--Now check the SomeTestTable schema
So what SQL Server has done, is rationalised it down to a single datatype definition that can fulfil BOTH cases.
WITH T(tbt, val) AS
(
select 1,15.23335 UNION ALL
select 0,15.23335
)
Select
CASE WHEN tbt=1 THEN cast( CONVERT(DECIMAL(24,0),val) as sql_variant)
ELSE CONVERT(DECIMAL(24,3),val) END
FROM T
returns
15
15.233
Thanks to bw_üezi
I got the answer after considering his advice. thanks for all others .
Here my answer..
declare #tbt int =0
select
CASE WHEN #tbt=1 THEN CAST(CONVERT(DECIMAL(24,0),15.23335)AS NVARCHAR)
ELSE CAST(CONVERT(DECIMAL(24,3),15.23335)AS NVARCHAR) END

Testing for whitespace in SQL Server

I've got some blank values in my table, and I can't seem to catch them in an IF statement.
I've tried
IF #value = '' and if #value = NULL and neither one catches the blank values. Is there any way to test whether or not a varchar is entirely whitespace?
AHA! Turns out I was testing for null wrong. Thanks.
ltrim(rtrim(isNull(#value,''))) = ''
To compare with NULL, use the IS NULL keyword.
--Generic example:
SELECT *
FROM MY_TABLE
WHERE SOME_FIELD IS NULL;
--Instead of
SELECT *
FROM MY_TABLE
WHERE SOME_FIELD = NULL;
if length(#value) = 0 or #value is null
(LTRIM(RTRIM(#Value))=''
should do the trick.
where length(rtrim(ltrim(yourcolumnname))) = 0 OR yourcolumnname is null
I just did some testing, and found out something interesting.
I used to write my queries like so:
SELECT *
FROM TableA
WHERE Val IS NOT NULL
AND LEN(RTRIM(LTRIM(Val))) > 0
But, you don't actually need to check for null, all you need to do is check the length after trimming the value.
SELECT *
FROM TableA
WHERE LEN(RTRIM(LTRIM(Val))) > 0
This select weeds out nulls as well as any columns with just white space.
As it turns out, you don't need to trim the value because SQL Server ignores trailing whitespace, so all you actually need it this:
SELECT *
FROM TableA
WHERE LEN(Val) > 0
Rather then performing excessive string manipulation with LTRIM AND RTRIM, just search the expression for the first "non-space".
SELECT
*
FROM
[Table]
WHERE
COALESCE(PATINDEX('%[^ ]%', [Value]), 0) > 0
You may have fields with multiple spaces (' ') so you'll get better results if you trim that:
where ltrim(yourcolumnname) = ''