Performance Issue with collate SQL_Latin1_General_CP1_CI_AS - sql

In my query I have used collate SQL_Latin1_General_CP1_CI_AS which results in very bad performance (almost 2 minutes).
Could you please suggest how I can resolve my performance issue?
Basically I was getting this error:
Cannot resolve the collation conflict between "Latin1_General_CS_AS" and "SQL_Latin1_General_CP1_CI_AS" in the equal to operation
Code:
CREATE TABLE #tmpConverted
(
PolicyNo NVarchar(10),
ShowToken NVarChar(16)
)
SELECT *
FROM #tmpConverted tpd
WHERE EXISTS (SELECT 1
FROM [dbo].CheckRecords cr
WHERE cr.DetailRecord LIKE '%' + tpd.ShowToken + '%' COLLATE SQL_Latin1_General_CP1_CI_AS)
Sample data #tmpConverted:

Related

OPENJSON collation in Azure Synapse causes a collation conflict error

I have an OPENJSON command that takes the parsed JSON and LEFT joins it onto an existing table.
When I add the LEFT JOIN I get the error:
collation conflict between "Latin1_General_BIN2" and "SQL_Latin1_General_CP1_CI_AS"
The table has the same collation for all string columns as: SQL_Latin1_General_CP1_CI_AS
I've tried adding COLLATE DATABASE_DEFAULT in the LEFT JOIN, but with no improvement.
The query I'm using is roughly as:
DECLARE #json NVARCHAR(MAX) = '
{
"ExampleJson": {
"stuff": [
{
"_program_id": "hello",
"work_date": "2021-03-23 00:00:00"
}
]
}
}';
SELECT *
FROM
OPENJSON
(
(
#json
), '$.ExampleJson.stuff'
)
WITH (
[program_id] NVARCHAR(255) '$."program_id"'
,[work_date] DATETIME '$."work_date"'
) [json_data]
LEFT JOIN
[existing_db_data]
ON [existing_db_data].[program_id] = [json_data].[program_id]
Interesting problem which I cannot reproduce but you should be able to resolve by placing the correct collation in either the WITH clause or after the join, eg
WITH (
[program_id] NVARCHAR(255) COLLATE SQL_Latin1_General_CP1_CI_AS '$."_program_id"'
,[work_date] DATETIME '$."work_date"'
) [json_data]
LEFT JOIN [existing_db_data] d ON d.[program_id] = [json_data].[program_id] COLLATE SQL_Latin1_General_CP1_CI_AS
Either should suffice, you do not need both.

How to improve the search speed when using left join SQL Server?

In SQL Server, I query data from two VIEW to get records whose OLDPID are -1,-2:
SELECT
T1.*, T2.LEAF
FROM
(SELECT *
FROM VIEW_OBJECT_TREE_DATA
WHERE OLDPID IN (-1, -2)) T1
LEFT JOIN
VIEW_OBJECT_TREE_DATA_GROUP T2 ON T1.NODEID = T2.NODEPID
WHERE
T1.STATE = 1
But it takes 3-4 seconds to get the result.
How can I modify this SQL query to improve its speed?
VIEW_OBJECT_TREE_DATA has OLDPID, OLDID and NAME columns with 450000 records.VIEW_OBJECT_TREE_DATA_GROUP has NODEPID and LEAF with 65000 records.
Below is some sql of view and function:
VIEW_OBJECT_TREE_DATA:
CREATE VIEW dbo.VIEW_OBJECT_TREE_DATA
AS
SELECT(SELECT[dbo].[FNNC_GET_TREE_GUID](0, OBJECT_ID)) AS NODEID,
(SELECT[dbo].[FNNC_GET_TREE_GUID](0, PARENT_ID)) AS NODEPID, 'MY_OBJECT_TABLE' AS[TABLE],
OBJECT_ID AS OLDID, PARENT_ID AS OLDPID, OBJECT_NAME COLLATE DATABASE_DEFAULT AS NAME,
OBJECT_CODE COLLATE database_default AS CODE, OBJECT_TYPE COLLATE database_default AS TYPE,
OBJECT_STATE as STATE
FROM dbo.MY_OBJECT_TABLE
WHERE OBJECT_STATE <> -1
UNION
SELECT(SELECT[dbo].[FNNC_GET_TREE_GUID](1, INDICATOR_ID)) AS NODEID,
(SELECT[dbo].[FNNC_GET_TREE_GUID](0, OBJECT_ID)) AS NODEPID, 'MY_INDICATOR_TABLE' AS[TABLE],
INDICATOR_ID AS OLDID, OBJECT_ID AS OLDPID, INDICATOR_NAME COLLATE DATABASE_DEFAULT AS NAME,
INDICATOR_CODE COLLATE database_default AS CODE, INDICATOR_TYPE COLLATE database_default AS TYPE,
INDICATOR_STATE AS STATE
FROM dbo.MY_INDICATOR_TABLE
WHERE INDICATOR_STATE <> -1
VIEW_OBJECT_TREE_DATA_GROUP :
CREATE VIEW VIEW_OBJECT_TREE_DATA_GROUP
AS
SELECT NODEPID,COUNT(0) AS LEAF FROM VIEW_OBJECT_TREE_DATA GROUP BY NODEPID
Function:
USE[MY_DATABASE]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION[dbo].[FNNC_GET_TREE_GUID](#TYPE INT, #ID INT)
RETURNS UNIQUEIDENTIFIER
AS
BEGIN
RETURN CAST(CAST(#TYPE AS binary(4))+CAST(#ID AS varbinary(28)) AS UNIQUEIDENTIFIER)
END
You can directly join those two views without having to use a subquery.
select TD.*, DG.LEAF
from VIEW_OBJECT_TREE_DATA as TD
left join VIEW_OBJECT_TREE_DATA_GROUP as DG on DG.NODEPID = TD.NODEID
where TD.OLDPID in (-1, -2) and
TD.STATE = 1
Although without seeing those views and the execution plan there is no way of knowing what slows you down.

collation conflict between "Hebrew_CI_AS" and "SQL_Latin1_General_CP1_CI_AS"

in some procedure that i work on, i write this code:
update a
set a.custName = b.custName
from #x as a inner join pl_Customer as b on a.Company_Code = b.Company_Code and a.cust = b.Cust
ans i got this error:
Cannot resolve the collation conflict between "Hebrew_CI_AS" and
"SQL_Latin1_General_CP1_CI_AS" in the equal to operation.
i try so solve it with this:
update a
set a.custName = b.custName
from #x as a inner join pl_Customer as b on a.Company_Code = b.Company_Code and a.cust = b.Cust
collate Latin1_General_CI_AI;
but it is still error.
Temporary tables are created using the server's collation by default. It looks like your server's collation is SQL_Latin1_General_CP1_CI_AS and the database's (actually, the column's) Hebrew_CI_AS or vice versa.
You can overcome this by using collate database_default in the temporary table's column definitions, eg :
create #x (
ID int PRIMARY KEY,
Company_Code nvarchar(20) COLLATE database_default,
Cust nvarchar(20) COLLATE database_default,
...
)
This will create the columns using the current database's collation, not the server's.
In your temp table definition #x,add COLLATE DATABASE_DEFAULT to the String columns, like
custName nvarchar(xx) COLLATE DATABASE_DEFAULT NOT NULL

COLLATE in UDF does not work as expected

I have a table with text field. I want to select rows where text is in all caps. This code works as it should, and returns ABC:
SELECT txt
FROM (SELECT 'ABC' AS txt UNION SELECT 'cdf') t
WHERE
txt COLLATE SQL_Latin1_General_CP1_CS_AS = UPPER(txt)
then I create UDF (as suggested here):
CREATE FUNCTION [dbo].[fnsConvert]
(
#p NVARCHAR(2000) ,
#c NVARCHAR(2000)
)
RETURNS NVARCHAR(2000)
AS
BEGIN
IF ( #c = 'SQL_Latin1_General_CP1_CS_AS' )
SET #p = #p COLLATE SQL_Latin1_General_CP1_CS_AS
RETURN #p
END
and run it as follows (which looks like an equivalent code to me):
SELECT txt
FROM (SELECT 'ABC' AS txt UNION SELECT 'cdf') t
WHERE
dbo.fnsConvert(txt, 'SQL_Latin1_General_CP1_CS_AS') = UPPER(txt)
however, this returns ABC as well as cdf.
Why is that so, and how do I get this to work?
PS I need UDF here to be able to call case-sensitive comparison from .Net LINQ2SQL provider.
A variable cannot have it's own collation. It will always use the server's default. Check this:
--I declare three variables, each of which get's its own collation - at least one might think so:
DECLARE #deflt VARCHAR(100) = 'aBc'; --Latin1_General_CI_AS in my system
DECLARE #Arab VARCHAR(100) = 'aBc' COLLATE Arabic_100_CS_AS_WS_SC;
DECLARE #Rom VARCHAR(100) = 'aBc' COLLATE Romanian_CI_AI
--Now check this. All three variables are seen as the system's default collation:
SELECT [name], system_type_name, collation_name
FROM sys.dm_exec_describe_first_result_set(N'SELECT #deflt AS Deflt, #Arab AS Arab, #Rom AS Rom'
,N'#deflt varchar(100), #Arab varchar(100),#Rom varchar(100)'
,0);
/*
name system_type_name collation_name
Deflt varchar(100) Latin1_General_CI_AS
Arab varchar(100) Latin1_General_CI_AS
Rom varchar(100) Latin1_General_CI_AS
*/
--Now we check a simple comparison of "aBc" against "ABC"
SELECT CASE WHEN #deflt = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckDefault
,CASE WHEN #Arab = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckArab
,CASE WHEN #Rom = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckRom
/*CI CI CI*/
--But we can specify the collation for one given action!
SELECT CASE WHEN #deflt = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckDefault
,CASE WHEN #Arab = 'ABC' COLLATE Arabic_100_CS_AS_WS_SC THEN 'CI' ELSE 'CS' END AS CheckArab
,CASE WHEN #Rom = 'ABC' COLLATE Romanian_CI_AI THEN 'CI' ELSE 'CS' END AS CheckRom
/*CI CS CI*/
--But a table's column will behave differently:
CREATE TABLE #tempTable(deflt VARCHAR(100)
,Arab VARCHAR(100) COLLATE Arabic_100_CS_AS_WS_SC
,Rom VARCHAR(100) COLLATE Romanian_CI_AI);
INSERT INTO #tempTable(deflt,Arab,Rom) VALUES('aBc','aBc','aBc');
SELECT [name], system_type_name, collation_name
FROM sys.dm_exec_describe_first_result_set(N'SELECT * FROM #tempTable',NULL,0);
DROP TABLE #tempTable;
/*
name system_type_name collation_name
deflt varchar(100) Latin1_General_CI_AS
Arab varchar(100) Arabic_100_CS_AS_WS_SC
Rom varchar(100) Romanian_CI_AI
*/
--This applys for declared table variables also. The comparison "knows" the specified collation:
DECLARE #TableVariable TABLE(deflt VARCHAR(100)
,Arab VARCHAR(100) COLLATE Arabic_100_CS_AS_WS_SC
,Rom VARCHAR(100) COLLATE Romanian_CI_AI);
INSERT INTO #TableVariable(deflt,Arab,Rom) VALUES('aBc','aBc','aBc');
SELECT CASE WHEN tv.deflt = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckDefault
,CASE WHEN tv.Arab = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckArab
,CASE WHEN tv.Rom = 'ABC' THEN 'CI' ELSE 'CS' END AS CheckRom
FROM #TableVariable AS tv
/*CI CS CI*/
UPDATE Some documentation
At this link You can read about the details. A collation does not change the value. It applys a rule (related to NOT NULL which does not change the values, but just adds the rule whether NULL can be set or not).
The documentation tells clearly
Is a clause that can be applied to a database definition or a column definition to define the collation, or to a character string expression to apply a collation cast.
And a bit later you'll find
Creating or altering a database
Creating or altering a table column
Casting the collation of an expression
UPDATE 2: A suggestion for a solution
If you want to have control whether a comparison is done CS or CI you might try this:
DECLARE #tbl TABLE(SomeValueInDefaultCollation VARCHAR(100));
INSERT INTO #tbl VALUES ('ABC'),('aBc');
DECLARE #CompareCaseSensitive BIT = 0;
DECLARE #SearchFor VARCHAR(100) = 'aBc';
SELECT *
FROM #tbl
WHERE (#CompareCaseSensitive=1 AND SomeValueInDefaultCollation=#SearchFor COLLATE Latin1_General_CS_AS)
OR (ISNULL(#CompareCaseSensitive,0)=0 AND SomeValueInDefaultCollation=#SearchFor COLLATE Latin1_General_CI_AS);
With #CompareCaseSensitive set to 1 it will return just the aBc, with NULL or 0 it will return both lines.
This is - for sure! - much better in performance than an UDF.
Please try using BINARY_CHECKSUM Function, and no need to UDF Function:
SELECT txt
FROM (SELECT 'ABC' AS txt UNION SELECT 'cdf') t
WHERE
BINARY_CHECKSUM(txt)= BINARY_CHECKSUM(UPPER(txt))
I think you are confused on how collation works. If you want to force a case sensitive collation you would do it in your where predicate, not with a function like that. And scalar functions are horrible for performance.
Here is how you would be able to use collation for this type of thing.
SELECT txt
FROM (SELECT 'ABC' AS txt UNION SELECT 'cdf') t
WHERE txt collate SQL_Latin1_General_CP1_CS_AS = UPPER(txt)
Here's what I did:
I changed the function to perform a comparison, instead of setting the collation, and then return a 1 or 0.
CREATE FUNCTION [dbo].[fnsConvert]
(
#p NVARCHAR(2000) ,
#c NVARCHAR(2000)
)
RETURNS BIT
AS
BEGIN
DECLARE #result BIT
IF ( #c = 'SQL_Latin1_General_CP1_CS_AS' )
BEGIN
IF #p COLLATE SQL_Latin1_General_CP1_CS_AS = UPPER(#p)
SET #result = 1
ELSE
SET #result = 0
END
ELSE
SET #result = 0
RETURN #result
END
Then the query that uses the function changes just a bit.
SELECT txt
FROM (SELECT 'ABC' AS txt UNION SELECT 'cdf') t
WHERE
dbo.fnsConvert(txt, 'SQL_Latin1_General_CP1_CS_AS') = 1
As #Shnugo stated, the collation is not an attribute of a variable, but it can be attribute of a column definition.
For collation-enabled comparison outside of TSQL, you can define a (persisted) computed column with an explicit collation:
create table Q47890189 (
txt nvarchar(100),
colltxt as txt collate SQL_Latin1_General_CP1_CS_AS persisted
)
insert into Q47890189 (txt) values ('ABC')
insert into Q47890189 (txt) values ('cdf')
select * from Q47890189 where txt = UPPER(txt)
select * from Q47890189 where colltxt = UPPER(colltxt)
Note that a persisted column can also be indexed, and has a better performance than calling a scalar function.
COLLATE :Is a clause that can be applied to a database definition or a column definition to define the collation, or to a character string expression to apply a collation cast.
COLLATE do not convert any column or variable..It define the characteristics of collate.
CREATE TABLE [dbo].[OINV]
[CardCode] [nvarchar](50) NULL
)
if i have a table with 5175460 rows
then converting this to another data type will take time because its of its value is converted to new data type.
alter table OINV
alter column CardCode varchar(50)
--1 min 45 sec
alter table OINV
alter column CardCode nvarchar(50) COLLATE SQL_Latin1_General_CP1_CS_AS
If i don't convert the data type and only want to change collate
then it take 1 ms to do so.That means it do not convert 5175460 rows to said collate.
It just define the collate on that column.
when this column is use in where condition then column will exhibit characteristics of said collate.
UDF/TVF is not perform-ant way to do so.Best way is to alter table
Another example,
declare #i varchar(60)='ABC'
SELECT txt
FROM (SELECT 'abc' AS txt UNION SELECT 'cdf') t
WHERE
txt = #i COLLATE SQL_Latin1_General_CP1_CS_AS
I can't declare it like this,
declare #i varchar(60) COLLATE SQL_Latin1_General_CP1_CS_AS='ABC'
So variable will exhibit collate characteristics only as long as it is use along collate .
In your case you are return only plain variable,
UDF way of doing so,
CREATE FUNCTION testfn (
#test VARCHAR(100)
,#i INT
)
RETURNS TABLE
AS
RETURN (
-- insert into #t values(#test)
SELECT #test COLLATE SQL_Latin1_General_CP1_CS_AS AS a
)
SELECT *
FROM (
SELECT 'ABC' AS txt
UNION
SELECT 'cdf'
) t
OUTER APPLY dbo.testfn(txt, 0) fn
WHERE fn.a = UPPER(txt)
To define multiple collate you have to define multiple table with different collate. TVF can return only static table schema,so there can be only one collate define.
Therefore TVF is not right way to perform your task.
I agree with #Shnugo when you create local variable it will take default collation
But, you could explicitly collate your variable values returned by function with your user defined collation as follow :
select * from
(SELECT 'ABC' AS txt UNION SELECT 'cdf') a
where (dbo.fnsConvert(txt, 'SQL_Latin1_General_CP1_CS_AS')
collate SQL_Latin1_General_CP1_CS_AS) = UPPER(txt)
In addition collate clause can only applied to database definition, column defination or string/character expression, in other words it is used for database objects i.e. tables, columns, indexes
collation_name can't be represented by variable or expression.
MSDN clearly defines COLLATE:
Is a clause that can be applied to a database definition or a column
definition to define the collation, or to a character string
expression to apply a collation cast.
Can you see a word about variable here?
If you need UDF, just use table-valued function:
CREATE FUNCTION dbo.test
(
#text nvarchar(max)
)
RETURNS TABLE
AS
RETURN
(
SELECT c COLLATE SQL_Latin1_General_CP1_CS_AS as txt
FROM (VALUES (#text)) as t(c)
)
GO
And use it like:
;WITH cte AS (
SELECT N'ABC' as txt
UNION
SELECT N'cdf'
)
SELECT c.txt
FROM cte c
OUTER APPLY dbo.test (c.txt) t
WHERE t.txt = UPPER(c.txt)
Output:
txt
------
ABC

SQL query issue in avoid duplicates in INSERT INTO SELECT?

The following query works perfectly,
insert into [EGallery].dbo.[CustomerDetails]
Select Distinct B.CountyB as 'Mobile' , Cast(BuildingB as Varchar(100)) as 'Email' ,
A.CardCode , A.CardName as 'First Name' , '' as 'Last Name' ,
'' as Gender , Cast(A.Address as Varchar(1000)) as 'Address' , Convert(Varchar(10), A.U_BirthDay,105) as 'birthday' ,
Convert(Varchar(10), A.U_AnnivDay ,105) as 'Anniversary' ,
Case
When A.CardCode Like '%%'+ C.WhsCode +'%%' Then Convert(Varchar(10) , A.DocDate ,105)
Else Convert(Varchar(10), (Select X.CreateDate From OCRD X Where X.CardCode = A.CardCode) ,105) End as 'JoinDate' ,
C.WhsCode as 'JoinStore','Open' as Status ,(Select GETDATE()) as CreatedDateTime,(Select GETDATE()) as ProcessDateTime, '' as StatusMSg
From OINV A
Inner Join INV12 B On A.DocEntry = B.DocEntry
Inner Join INV1 C On A.DocEntry = C.DocEntry
Where C.LineNum = '0'
--B.CountyB not in(select D.Mobile from [EGallery].dbo.[CustomerDetails] D where D.Mobile=B.CountyB)
--not exists (select Mobile from [EGallery].dbo.[CustomerDetails] D where D.Mobile=B.CountyB)
But before I insert records into the [EGallery].dbo.[CustomerDetails] table, I need to check whether the phone number already exists in the table. If the record already exists, there is no need to insert it again. For that I have added one more condition (which I have commented out in the query) but it reports this error while running the query:
Cannot resolve the collation conflict between "SQL_Latin1_General_CP850_CI_AS" and "Latin1_General_CI_AI" in the equal to operation.
According to here you have to add COLLATE DATABASE_DEFAULT to the queries like this:
Where C.LineNum = '0' AND
B.CountyB not in(select D.Mobile from [EGallery].dbo.[CustomerDetails] D where D.Mobile COLLATE DATABASE_DEFAULT = B.CountyB COLLATE DATABASE_DEFAULT) AND
not exists (select Mobile from [EGallery].dbo.[CustomerDetails] D where D.Mobile COLLATE DATABASE_DEFAULT = B.CountyB COLLATE DATABASE_DEFAULT)
Try to do this before your query:
USE [db name for object INV12]
GO
ALTER TABLE [EGallery].dbo.[CustomerDetails]
ALTER COLUMN Mobile
VARCHAR(100) COLLATE Latin1_General_CI_AS NOT NULL
ALTER TABLE INV12
ALTER COLUMN CountyB
VARCHAR(100) COLLATE Latin1_General_CI_AS NOT NULL
UPDATE1:
If you have an index on one of this columns, or on both of them, you need to delete it and create index again after a new collation will be changed.
I recommend you to use a MERGE statement, as it
performs insert, update, or delete operations on a target table based on the results of a join with a source table. For example, you can synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.