SQL Server : join on uniqueidentifier - sql

I have two tables Backup and Requests.
Below is the script for both the tables
Backup
CREATE TABLE UserBackup(
FileName varchar(70) NOT NULL,
)
File name is represented by a guid. Sometimes there is some additional information related to the file. Hence we have entries like guid_ADD entried in table.
Requests
CREATE TABLE Requests(
RequestId UNIQUEIDENTIFIER NOT NULL,
Status int Not null
)
Here are some sample rows :
UserBackup table:
FileName
15b993cc-e8be-405d-bb9f-0c58b66dcdfe
4cffe724-3f68-4710-b785-30afde5d52f8
4cffe724-3f68-4710-b785-30afde5d52f8_Add
7ad22838-ddee-4043-8d1f-6656d2953545
Requests table:
RequestId Status
15b993cc-e8be-405d-bb9f-0c58b66dcdfe 1
4cffe724-3f68-4710-b785-30afde5d52f8 1
7ad22838-ddee-4043-8d1f-6656d2953545 2
What I need is to return all the rows from userbackup table whose name (the guid) is matches RequestId in the Requests table and the status is 1. So here is the query I wrote
Select *
from UserBackup
inner join Requests on UserBackup.FileName = Requests.RequestId
where Requests.Status = 1
And this works fine. It returns me the following result
FileName RequestId Status
15b993cc-e8be-405d-bb9f-0c58b66dcdfe 15b993cc-e8be-405d-bb9f-0c58b66dcdfe 1
4cffe724-3f68-4710-b785-30afde5d52f8 4cffe724-3f68-4710-b785-30afde5d52f8 1
4cffe724-3f68-4710-b785-30afde5d52f8_Add 4cffe724-3f68-4710-b785-30afde5d52f8 1
This is exactly what I want. But what I don't understand is how it is working. If you notice the result is returning 4cffe724-3f68-4710-b785-30afde5d52f8_Add row as well. The inner join is on varchar and uniqueidentifier, and this join instead of working like "Equals to" comparison works like "contains" comparison. I want to know how this works so that I can be sure to use this code without any unexpected scenarios.

The values on both sides of a comparison have to be of the same data type. There's no such thing as, say, comparing a uniqueidentifier and a varchar.
uniqueidentifier has a higher precedence than varchar so the varchars will be converted to uniqueidentifiers before the comparison occurs.
Unfortunately, you get no error or warning if the string contains more characters than are needed:
select CONVERT(uniqueidentifier,'4cffe724-3f68-4710-b785-30afde5d52f8_Add')
Result:
4CFFE724-3F68-4710-B785-30AFDE5D52F8
If you want to force the comparison to occur between strings, you'll have to perform an explicit conversion:
Select *
from UserBackup
inner join Requests
on UserBackup.FileName = CONVERT(varchar(70),Requests.RequestId)
where Requests.Status = 1

When you compare two columns of different data types SQL Server will attempt to do implicit conversion on lower precedence.
The following comes from MSDN docs on uniqueidentifier
The following example demonstrates the truncation of data when the
value is too long for the data type being converted to. Because the
uniqueidentifier type is limited to 36 characters, the characters that
exceed that length are truncated.
DECLARE #ID nvarchar(max) = N'0E984725-C51C-4BF4-9960-E1C80E27ABA0wrong';
SELECT #ID, CONVERT(uniqueidentifier, #ID) AS TruncatedValue;
http://msdn.microsoft.com/en-us/library/ms187942.aspx
Documentation is clear that data is truncated
When ever you are unsure about your join operation you can verify Actual Execution Plan.
Here is test sample that you can run inside SSMS or SQL Sentry Plan Explorer
DECLARE #userbackup TABLE ( _FILENAME VARCHAR(70) )
INSERT INTO #userbackup
VALUES ( '15b993cc-e8be-405d-bb9f-0c58b66dcdfe' ),
( '4cffe724-3f68-4710-b785-30afde5d52f8' ),
( '4cffe724-3f68-4710-b785-30afde5d52f8_Add' )
, ( '7ad22838-ddee-4043-8d1f-6656d2953545' )
DECLARE #Requests TABLE
(
requestID UNIQUEIDENTIFIER
,_Status INT
)
INSERT INTO #Requests
VALUES ( '15b993cc-e8be-405d-bb9f-0c58b66dcdfe', 1 )
, ( '4cffe724-3f68-4710-b785-30afde5d52f8', 1 )
, ( '7ad22838-ddee-4043-8d1f-6656d2953545', 2 )
SELECT *
FROM #userbackup u
JOIN #Requests r
ON u.[_FILENAME] = r.requestID
WHERE r.[_Status] = 1
Instead of regular join operation SQL Server is doing HASH MATCH with EXPR 1006 in SSMS it is hard to see what is doing but if you open XML file you will find this
<ColumnReference Column="Expr1006" />
<ScalarOperator ScalarString="CONVERT_IMPLICIT(uniqueidentifier,#userbackup.[_FILENAME] as [u].[_FILENAME],0)">
When ever in doubt check execution plan and always make sure to match data types when comparing.
This is great blog Data Mismatch on WHERE Clause might Cause Serious Performance Problems from Microsoft engineer on exact problem.

What is happening here is the FileName is being converted from varchar to a UniqueIdentifier, and during that process it ignores anything after the first 36 characters.
You can see it in action here
Select convert(uniqueidentifier, UserBackup.FileName), FileName
from UserBackup
It works, but to reduce confusion for the next person to come along, you might want to store the RequestId associated with the UserBackup as a GUID in the UserBackup table and join on that.
At the very least put a comment in ;)

Related

Error converting nvarchar to bigint in WHERE clause fails but works in SELECT

I am trying to achieve the following:
Get all Field values where the FieldValue is greater than 100 when the
value is stored as a number.
The indicator of whether or not if a value was stored as a number is given by the field type, which is another where clause.
The issue I am facing is that when I try to do the data field conversion in my WHERE statement, it failed.
I run:
SELECT FieldValue FROM CARData A
JOIN Fields B ON A.FieldId = B.FieldId
WHERE FieldTypeId = 3 AND FieldValue IS NOT NULL
And this returns the expected result of:
But if I added the WHERE clause to filter by the value:
SELECT FieldValue FROM CARData A
JOIN Fields B ON A.FieldId = B.FieldId
WHERE FieldTypeId = 3 AND FieldValue IS NOT NULL
AND CAST(FieldValue AS BIGINT) > 100
It throws the error:
Error converting data type nvarchar to bigint.
I somewhat understand what the issue is- it is trying to convert ALL values in the table to a bigint and is failing when it hits a non-numeric value.
I attempted to solve this by nesting the first query in a second like so:
SELECT RESULT.FieldValue FROM (
SELECT FieldValue FROM CARData A
JOIn Fields B ON A.FieldId = B.FieldId
WHERE
FieldTypeId = 3
AND FieldValue IS NOT NULL
AND ISNUMERIC(A.FieldValue) = 1) RESULT
WHERE CAST(FieldValue AS BIGINT) > 100
But even that does not return anything other than the aforementioned error.
While it's true that changing the structure of your query can cause SQL to pick a different plan, you should limit that technique to trying to help the optimizer pick a performant plan. The reason for this is that if successful execution of the logic depends on a particular choice of plan, then your query might work today but fail tomorrow (or in production) when SQL decides to pick a different plan.
Fortunately you don't have to rely on that here! Use try_cast
SELECT FieldValue FROM CARData A
JOIN Fields B ON A.FieldId = B.FieldId
WHERE FieldTypeId = 3
AND TRY_CAST(FieldValue AS BIGINT) > 100
I'm also curious... is this part of a SQL course? If so, tell your instructor to come visit StackOverflow so we can tell them to stop teaching students to use EAV's. Unless the whole point is to show you how horrible they are! :)

How do I use SQL Server to select into another table?

I am consolidating a web service. I am replacing multiple calls to the service with one call that contains the data.
I have created a table:
CREATE TABLE InvResults
(
Invoices nvarchar(max),
InvoiceDetails nvarchar(max),
Products nvarchar(max)
);
I used (max) because I don't know how complex the json will get at this time.
I need to do some sort of selects like this (this is pseudocode, not actual SQL):
SELECT
(SELECT *
INTO InvResults for Column Invoices
FROM MyInvoiceTable
WHERE SomeColumns = 'someStuffvariable'
FOR JSON PATH, ROOT('invoices')) AS invoices;
SELECT
(SELECT *
INTO InvResults for Column InvoiceDetails
FROM MyInvoiceDetailsTable
WHERE SomeColumns = 'someStuffvariable'
FOR JSON PATH, ROOT('invoicedetails')) AS invoicedetails;
I don't know how to format this and my google skills are failing me at this point. I understand that I probably want to use an UPDATE statement, but I'm not sure how to do this in combination with the rest of my requirements. I'm exploring How do I UPDATE from a SELECT in SQL Server? but I am still at a halt.
The end result should be a table "InvResults" that has 3 columns containing one row with results from Select statements as JSON. The column names should be defined the same as the json root objects.
INSERT INTO InvResults(Invoices,InvoidesDetails)
SELECT
(SELECT *
INTO InvResults for Column Invoices
FROM MyInvoiceTable
WHERE SomeColumns = 'someStuffvariable'
FOR JSON PATH, ROOT('invoices'))
,
(SELECT *
INTO InvResults for Column InvoiceDetails
FROM MyInvoiceDetailsTable
WHERE SomeColumns = 'someStuffvariable'
FOR JSON PATH, ROOT('invoicedetails'))
;
Because the SELECT.. FOR JSON is only returning 1 row above works.
The third field is easily to added, but left to do for yourself 😉

Use value of one column as identifier in another table in SNOWFLAKE

I have two table one of which contains the rule for another
create table t1(id int, query string)
create table t2(id int, place string)
insert into t1 values (1,'id < 10')
insert into t1 values (2,'id == 10')
And the values in t2 are
insert into t2 values (11,'Nevada')
insert into t2 values (20,'Texas')
insert into t2 values (10,'Arizona')
insert into t2 values (2,'Abegal')
I need to select from second table as per the value of first table column value.
like
select * from t2 where {query}
or
with x(query)
as
(select c2 from test)
select * from test where query;
but neither are helping.
There are a couple of problems with storing criteria in a table like this:
First, as has already been noted, you'll likely have to resort to dynamic SQL, which can get messy, and limits how you can use it.
It's going to be problematic (to say the least) to validate and parse your criteria. What if someone writes a rule of [id] *= 10, or [this_field_doesn't_exist] = blah?
If you're just storing potential values for your [id] column, one solution would be to have your t1 (storing your queries) include a min value and max value, like this:
CREATE TABLE t1
(
[id] INT IDENTITY(1,1) PRIMARY KEY,
min_value INT NULL,
max_value INT NULL
)
Note that both the min and max values can be null. Your provided criteria would then be expressed as this:
INSERT INTO t1
([id], min_value, max_value)
VALUES
(1, NULL, 10),
(2, 10, 10)
Note that I've explicitly referenced what attibutes we're inserting into, as you should also be doing (to prevent issues with attributes being added/modified down the line).
A null value on min_value means no lower limit; a null max_value means no upper limit.
To then get results from t2 that meet all your t1 criteria, simply do an INNER JOIN:
SELECT t2.*
FROM t2
INNER JOIN t1 ON
(t2.id <= t1.max_value OR t1.max_value IS NULL)
AND
(t2.id >= t1.min_value OR t1.min_value IS NULL)
Note that, as I said, this will only return results that match all your criteria. If you need to more complex logic (for example, show records that meet Rules 1, 2 and 3, or meet Rule 4), you'll likely have to resort to dynamic SQL (or at the very least some ugly JOINs).
As stated in a comment, however, you want to have more complex rules, which might mean you have to end up using dynamic SQL. However, you still have the problem of validating and parsing your rule. How do you handle cases where a user enters an invalid rule?
A better solution might be to store your rules in a format that can easily be parsed and validated. For example, come up with an XML schema that defines a valid rule/criterion. Then, your Rules table would have a rule XML attribute, tied to that schema, so users could only enter valid rules. You could then either shred that XML document, or create the SQL client-side to come up with your query.
I got the answer myself. And I am putting it below.
I have used python CLI to do the job. (As snowflake does not support dynamic query)
I believe one can use the same for other DB (tedious but doable)
setting up configuration to connect
CONFIG_PATH = "/root/config/snowflake.json"
with open(CONFIG_PATH) as f:
config = json.load(f)
#snowflake
snf_user = config['snowflake']['user']
snf_pwd = config['snowflake']['pwd']
snf_account = config['snowflake']['account']
snf_region = config['snowflake']['region']
snf_role = config['snowflake']['role']
ctx = snowflake.connector.connect(
user=snf_user,
password=snf_pwd,
account=snf_account,
region=snf_region,
role=snf_role
)
--comment
Used multiple cursor as in loop we don't want recursive connection
cs = ctx.cursor()
cs1 = ctx.cursor()
query = "select c2 from test"
cs.execute(query)
for (x) in cs:
y = "select * from test1 where {0}".format(', '.join(x).replace("'",""))
cs1.execute(y)
for (y1) in cs1:
print('{0}'.format(y1))
And boom done

SQL conversion from varchar to uniqueidentifier fails in view

I'm stuck on the following scenario.
I have a database with a table with customer data and a table where I put records for monitoring what is happening on our B2B site.
The customer table is as follow:
ID, int, not null
GUID, uniqueidentfier, not null, primary key
Other stuff...
The monitoring table:
ID, int, not null
USERGUID, uniqueidentifier, null
PARAMETER2, varchar(50), null
Other stuff...
In PARAMETER1 are customer guids as wel as other data types stored.
Now the question came to order our customers according their last visit date, the most recent visited customers must come on the top of a grid.
I'm using Entity Framework and I had problems of comparing the string and the guid type, so I decided to make a view on top of my monitoring table:
SELECT
ID,
CONVERT(uniqueidentifier, parameter2) AS customerguid,
USERguid,
CreationDate
FROM
MONITORING
WHERE
(dbo.isuniqueidentifier(parameter2) = 1)
AND
(parameter1 LIKE 'Customers_%' OR parameter1 LIKE 'Customer_%')
I imported the view in EF and made my Linq query. It returned nothing, so I extracted the generated SQL query. When testing the query in SQL Management Studio I got the following error:
Conversion failed when converting from a character string to uniqueidentifier.
The problem lies in the following snippet (simplified for this question, but also gives an error:
SELECT *,
(
SELECT
[v_LastViewDateCustomer].[customerguid] AS [customerguid]
FROM [dbo].[v_LastViewDateCustomer] AS [v_LastViewDateCustomer]
WHERE c.GUID = [v_LastViewDateCustomer].[customerguid]
)
FROM CM_CUSTOMER c
But when I do a join, I get my results:
SELECT *
FROM CM_CUSTOMER c
LEFT JOIN
[v_LastViewDateCustomer] v
on c.GUID = v.customerguid
I tried to make a SQL fiddle, but it is working on that site. http://sqlfiddle.com/#!3/66d68/3
Anyone who can point me in the right direction?
Use
TRY_CONVERT(UNIQUEIDENTIFIER, parameter2) AS customerguid
instead of
CONVERT(UNIQUEIDENTIFIER, parameter2) AS customerguid
Views are inlined into the query and the CONVERT can run before the WHERE.
For some additional discussion see SQL Server should not raise illogical errors

Select with Many Discrete Values Possible in Where Clause

Given a table like
CREATE TABLE [dbo].[Article](
[Id] [int] NOT NULL,
[CategoryId] [int] NOT NULL,
[Text] [nchar](10) NOT NULL)
users are allowed to select one or more categories for which they would like to view data. Typically they will select 1-20 categories. To accommodate that, I generate parameterized queries similar to:
SELECT * FROM Article
WHERE CategoryId IN (#c1, #c2, #c3, #c4, #c5)
However, in some rare use cases a user can legitimately select hundreds of categories. This lead me to discover a limitation of Linq-to-Entities, which I worked around by forming ranges of category codes. Unfortunately this only pushes off the issue, as there are limits to the size of a query that can be passed to SQL Server.
I would like to refactor this query to avoid any hard limits. My first thought was to create a temporary table containing the requested categories, and performing an inner join against that temporary table in lieu of the IN(...) clause. However, I understand that temporary tables can be quite slow.
Is there a more elegant and/or better performing solution to this problem?
Your first instinct is correct, thopugh you might find a table-valued variable sufficient in place of a temp table. Don't worry about the performance in a case like this; it won't be significant. An index could always be created onthe temp table if needed, but that seems unlikley. Is there an index on the CategoryId field?
EDIT:
Oops. I missed the Linq part.
Here is a alternate syntax that may be worth a try (for performance reasons, not for string length reasons)
Select * from dbo.Article art where exists ( select null from ( select 0 as MyV union all select 2 as MyV union all select 3 as MyV ) as derived1 where derived1.MyV = art.CategoryId )
......................
This is how I handle it.
Sometimes my variable table is changed to a #temp table. I test the 2 different scenarios for performance.
You can pass as many or as few values via xml.
DECLARE #input XML = '<root>
<category myvalue="1" />
<category myvalue="2" />
<category myvalue="3" />
</root>'
declare #holder table ( CatID int )
Insert into #holder (CatID)
SELECT
myvalue = MyXmlTable.value('(#myvalue)', 'int')
FROM
#input.nodes('/root/category') AS Tbl(MyXmlTable)
select * from #holder
SELECT * FROM Article art
where exists (select null from #holder hold where hold.CatID = art.CategoryId
Bigger write up here:
http://www.sqlservercentral.com/articles/Stored+Procedures/thezerotonparameterproblem/2283/