SQL Server subquery behaviour - sql

I have a case where I want to check to see if an integer value is found in a column of a table that is varchar, but is a mix of values that can be integers some are just strings. My first thought was to use a subquery to just select the rows with numeric-esque values. The setup looks like:
CREATE TABLE #tmp (
EmployeeID varchar(50) NOT NULL
)
INSERT INTO #tmp VALUES ('aa1234')
INSERT INTO #tmp VALUES ('1234')
INSERT INTO #tmp VALUES ('5678')
DECLARE #eid int
SET #eid = 5678
SELECT *
FROM (
SELECT EmployeeID
FROM #tmp
WHERE IsNumeric(EmployeeID) = 1) AS UED
WHERE UED.EmployeeID = #eid
DROP TABLE #tmp
However, this fails, with: "Conversion failed when converting the varchar value 'aa1234' to data type int.".
I don't understand why it is still trying to compare #eid to 'aa1234' when I've selected only the rows '1234' and '5678' in the subquery.
(I realize I can just cast #eid to varchar but I'm curious about SQL Server's behaviour in this case)

You can't easily control the order things will happen when SQL Server looks at the query you wrote and then determines the optimal execution plan. It won't always produce a plan that follows the same logic you typed, in the same order.
In this case, in order to find the rows you're looking for, SQL Server has to perform two filters:
identify only the rows that match your variable
identify only the rows that are numeric
It can do this in either order, so this is also valid:
identify only the rows that are numeric
identify only the rows that match your variable
If you look at the properties of this execution plan, you see that the predicate for the match to your variable is listed first (which still doesn't guarantee order of operation), but in any case, due to data type precedence, it has to try to convert the column data to the type of the variable:
Subqueries, CTEs, or writing the query a different way - especially in simple cases like this - are unlikely to change the order SQL Server uses to perform those operations.
You can force evaluation order in most scenarios by using a CASE expression (you also don't need the subquery):
SELECT EmployeeID
FROM #tmp
WHERE EmployeeID = CASE IsNumeric(EmployeeID) WHEN 1 THEN #eid END;
In modern versions of SQL Server (you forgot to tell us which version you use), you can also use TRY_CONVERT() instead:
SELECT EmployeeID
FROM #tmp
WHERE TRY_CONVERT(int, EmployeeID) = #eid;
This is essentially shorthand for the CASE expression, but with the added bonus that it allows you to specify an explicit type, which is one of the downsides of ISNUMERIC(). All ISNUMERIC() tells you is if the value can be converted to any numeric type. The string '1e2' passes the ISNUMERIC() check, because it can be converted to float, but try converting that to an int...
For completeness, the best solution - if there is an index on EmployeeID - is to just use a variable that matches the column data type, as you suggested.
But even better would be to use a data type that prevents junk data like 'aa1234' from getting into the table in the first place.

Related

Change Datatype of json_value during select into so I can sum column

I have a column in a table that is json. It contains several columns within it.
Example:
Row1: "sTCounts":[{"dpsTypeTest":"TESTTRIAL","cnt":3033244.0}
Row2: "sTCounts":[{"dpsTypeTest":"TESTTRIAL","cnt":3.3}
I need to sum the cnt value for all rows in table. For instance, the above would produce a result of 3033247.3
I'm not familiar with stored procs enough to master. I thought the easiest route would be to create a temp table and extract the value into a column, and then write a query to sum the column values.
The problem is that it creates a column with datatype nvarchar(4000). It won't let me sum that column. I thought of changing the datatype but not sure how. I am trying CAST without luck.
select CAST(json AS varchar) AS JSON_VALUE(jsontext,
'$.sTCounts.cnt') AS PerfCount, TitleNumber
INTO dbo_Testing_Count0
from PerformanceTest
select sum(PerfCount)
from dbo_Testing_Count
Group by PerfCount
The error message is:
Incorrect syntax near 'jsontext'.
Any ideas? I am open to another method to sum the column or changing the datatype whichever the experts can aid on. I appreciate it.
The JSON you provide in your question is not valid... This seems to be just a fragment of a larger JSON. As your data starts with a [ you have to think of it as an array, so the simple json path '$.serviceTierCounts.cnt' won't work probably...
Try this, I've added the opening { and the closing brackets at the end:
DECLARE #mockupTable TABLE(ID INT IDENTITY, YourJson NVARCHAR(MAX));
INSERT INTO #mockupTable VALUES
(N'{"serviceTierCounts":[{"dpsType":"TRIAL","cnt":3033244.0}]}')
,(N'{"serviceTierCounts":[{"dpsType":"TRIAL","cnt":3.3}]}');
--You can read one scalar value using JSON_VALUE directly with a cast. But in this case I need to add [0]. This will tell the engine to read the first (zero-based index!) object's cnt property.
SELECT CAST(JSON_VALUE(YourJson,'$.serviceTierCounts[0].cnt') AS DECIMAL(14,4))
FROM #mockupTable
--But I think, that it's this what you are looking for:
SELECT *
FROM #mockupTable
CROSS APPLY OPENJSON(YourJson,'$.serviceTierCounts')
WITH(dpsType varchar(100)
,cnt decimal(14,4));
The WITH clause will return the object in typed columns side-by-side.
For easy proceeding, you can wrap this as a CTE and continue with the set in the following SELECT.

T-Sql - Select query in another select query takes long time

I have a procedure with arguments but its calling takes a very long time. I decided to check what is wrong with my query and came to the conclusion that the problem is Column In (SELECT [...]).
Both queries return 1500 rows.
First query: time 45 second
Second query: time 0 second
1.
declare #FILTER_OPTION int
declare #ID_DISTRIBUTOR type_int_value
declare #ID_DATA_TYPE type_bigint_value
declare #ID_AGGREGATION_TYPE type_int_value
set #FILTER_OPTION = 8
insert into #ID_DISTRIBUTOR values (19)
insert into #ID_DATA_TYPE values (30025)
insert into #ID_AGGREGATION_TYPE values (10)
SELECT * FROM dbo.[DATA] WHERE
[ID_DISTRIBUTOR] IN (select [VALUE] from #ID_DISTRIBUTOR)
AND [ID_DATA_TYPE] IN (select [VALUE] from #ID_DATA_TYPE)
AND [ID_AGGREGATION_TYPE] IN (select [VALUE] from #ID_AGGREGATION_TYPE)
2.
select * FROM dbo.[DATA] WHERE
[ID_DISTRIBUTOR] IN (19)
AND [ID_DATA_TYPE] IN (30025)
AND [ID_AGGREGATION_TYPE] IN (10)
Why this is happening?
How should I create a stored procedure that takes an array of arguments to use it quickly?
Edit:
Maybe it's a problem with indexes? indexes are created on these three columns.
For such a large performance difference, I would guess that you have one or more indexes. In particular, if you have an index on (ID_DISTRIBUTOR, ID_DATA_TYPE, ID_AGGREGATION_TYPE), then the second query can make use of the index. SQL Server can recognize that the IN is really = and the query is a simple lookup.
In the first case, SQL Server doesn't "know" that the subqueries really have only one row in them. That requires a different set of optimizations. In particular, the above index cannot be used, because the IN generally optimizes differently from =.
As for what to do. First, look at the execution plans so you can see the different between the two versions. Then, test the second version with more than one value in the IN lists.
If you can live with just one value for each comparison, then use = rather than IN.

Optimizing stored procedure with multiple "LIKE"s

I am passing in a comma-delimited list of values that I need to compare to the database
Here is an example of the values I'm passing in:
#orgList = "1123, 223%, 54%"
To use the wildcard I think I have to do LIKE but the query runs a long time and only returns 14 rows (the results are correct, but it's just taking forever, probably because I'm using the join incorrectly)
Can I make it better?
This is what I do now:
declare #tempTable Table (SearchOrg nvarchar(max) )
insert into #tempTable
select * from dbo.udf_split(#orgList) as split
-- this splits the values at the comma and puts them in a temp table
-- then I do a join on the main table and the temp table to do a like on it....
-- but I think it's not right because it's too long.
select something
from maintable gt
join #tempTable tt on gt.org like tt.SearchOrg
where
AYEAR= ISNULL(#year, ayear)
and (AYEAR >= ISNULL(#yearR1, ayear) and ayear <= ISNULL(#yearr2, ayear))
and adate = ISNULL(#Date, adate)
and (adate >= ISNULL(#dateR1, adate) and adate <= ISNULL(#DateR2 , adate))
The final result would be all rows where the maintable.org is 1123, or starts with 223 or starts with 554
The reason for my date craziness is because sometimes the stored procedure only checks for a year, sometimes for a year range, sometimes for a specific date and sometimes for a date range... everything that's not used in passed in as null.
Maybe the problem is there?
Try something like this:
Declare #tempTable Table
(
-- Since the column is a varchar(10), you don't want to use nvarchar here.
SearchOrg varchar(20)
);
INSERT INTO #tempTable
SELECT * FROM dbo.udf_split(#orgList);
SELECT
something
FROM
maintable gt
WHERE
some where statements go here
And
Exists
(
SELECT 1
FROM #tempTable tt
WHERE gt.org Like tt.SearchOrg
)
Such a dynamic query with optional filters and LIKE driven by a table (!) are very hard to optimize because almost nothing is statically known. The optimizer has to create a very general plan.
You can do two things to speed this up by orders of magnitute:
Play with OPTION (RECOMPILE). If the compile times are acceptable this will at least deal with all the optional filters (but not with the LIKE table).
Do code generation and EXEC sp_executesql the code. Build a query with all LIKE clauses inlined into the SQL so that it looks like this: WHERE a LIKE #like0 OR a LIKE #like1 ... (not sure if you need OR or AND). This allows the optimizer to get rid of the join and just execute a normal predicate).
Your query may be difficult to optimize. Part of the question is what is in the where clause. You probably want to filter these first, and then do the join using like. Or, you can try to make the join faster, and then do a full table scan on the results.
SQL Server should optimize a like statement of the form 'abc%' -- that is, where the wildcard is at the end. (See here, for example.) So, you can start with an index on maintable.org. Fortunately, your examples meet this criteria. However, if you have '%abc' -- the wildcard comes first -- then the optimization won't work.
For the index to work best, it might also need to take into account the conditions in the where clause. In other words, adding the index is suggestive, but the rest of the query may preclude the use of the index.
And, let me add, the best solution for these types of searches is to use the full text search capability in SQL Server (see here).

SQL - ORDER BY running first

Please have a look at this database schema:
create table Person (id int not null identity,
[index] varchar(30),
datecreated datetime,
groupid int)
create table [Group] (id int identity not null, description varchar(30))
Sample data:
insert into Person ([index],datecreated,groupid) values ('4,5,6','2011-01-01',1)
insert into Person ([index],datecreated,groupid) values ('1,2,3','2011-02-02',1)
insert into Person ([index],datecreated,groupid) values ('7,8','2012-02-02',2)
insert into [Group] (description) values ('TestGroup')
insert into [Group] (description) values ('TestGroup2')
Please have a look at the SQL statement below:
select *
from Person
inner join [Group] on Person.groupid = [group].id
where [group].description = 'TestGroup'
order by
left(substring([index], charindex(',', [index]) + 1, 200),
charindex(',', substring([index], charindex(',', [index]) + 1, 200)) - 1)
This SQL statement fails with the following error:
Invalid length parameter passed to the SUBSTRING function.
It is the order by clause that is causing this error i.e. it is trying to find the third element of the index column but the third element does not exist on row 3 (there are only two elements).
However, I would expect the [group].description = 'TestGroup' to filter out record three. This does not appear to be the case. It is as if the order by clause is being run before the where clause. If you exclude the order by clause from the query, then the query runs.
Why is this?
Evaluation order in SQL has very weak guaranteed. Probably the sort is performed first, then a stream aggregate. Nothing wrong with that by it self.
You cannot rely on execution order in general. Except in a case-expression which you can use to create a dummy value like NULL in your order by if the input for SUBSTRING would be invalid. Case is the only way to enforce evaluation order.
This ORDER BY is pretty brutal. I would suggest breaking this into a couple of queries, using a temp-table or table sub-expression, so you can do your filtering first, and/or create a column containing the data to sort by.
Remember, SQL is a declarative language, not a procedural language. That is, you describe the result sets that you want. You depend on the SQL compiler/optimizer to set up the execution plan.
Very typically, a SQL engine will have a component that reads the data from the table and does all the calculations that are needed for that data. Of course, this includes calculations in the SELECT clause, but also calculations in "ON" clauses, "WHERE" clauses, and "ORDER BY" clauses.
The engine can then do the filtering after reading the data. This enables the engine to readily use computed values for the filtering.
I am not saying that all databases work this way. What I am saying is that there is no guarantee of the order of operations in a SQL statement. This situation is one of the cases where doing things in the wrong order results in an error, which prevents the SQL from completing. Do you want help rewriting the query so it doesn't get the error?

Error converting data type varchar

I currently have a table with a column as varchar. This column can hold numbers or text. During certain queries I treat it as a bigint column (I do a join between it and a column in another table that is bigint)
As long as there were only numbers in this field had no trouble but the minute even one row had text and not numbers in this field I got a "Error converting data type varchar to bigint." error even if in the WHERE part I made sure none of the text fields came up.
To solve this I created a view as follows:
SELECT TOP (100) PERCENT ID, CAST(MyCol AS bigint) AS MyCol
FROM MyTable
WHERE (isnumeric(MyCol) = 1)
But even though the view shows only the rows with numeric values and casts Mycol to bigint I still get a Error converting data type varchar to bigint when running the following query:
SELECT * FROM MyView where mycol=1
When doing queries against the view it shouldn't know what is going on behind it! it should simply see two bigint fields! (see attached image, even mssql management studio shows the view fields as being bigint)
OK. I finally created a view that works:
SELECT TOP (100) PERCENT id, CAST(CASE WHEN IsNumeric(MyCol) = 1 THEN MyCol ELSE NULL END AS bigint) AS MyCol
FROM dbo.MyTable
WHERE (MyCol NOT LIKE '%[^0-9]%')
Thanks to AdaTheDev and CodeByMoonlight. I used your two answers to get to this. (Thanks to the other repliers too of course)
Now when I do joins with other bigint cols or do something like 'SELECT * FROM MyView where mycol=1' it returns the correct result with no errors. My guess is that the CAST in the query itself causes the query optimizer to not look at the original table as Christian Hayter said may be going on with the other views
Ideally, you want to try to avoid storing the data in this form - would be worth splitting the BIGINT data out in to a separate column for both performance and ease of querying.
However, you can do a JOIN like this example. Note, I'm not using ISNUMERIC() to determine if it's a valid BIGINT because that would validate incorrect values which would cause a conversion error (e.g. decimal numbers).
DECLARE #MyTable TABLE (MyCol VARCHAR(20))
DECLARE #OtherTable TABLE (Id BIGINT)
INSERT #MyTable VALUES ('1')
INSERT #MyTable VALUES ('Text')
INSERT #MyTable VALUES ('1 and some text')
INSERT #MyTable VALUES ('1.34')
INSERT #MyTable VALUES ('2')
INSERT #OtherTable VALUES (1)
INSERT #OtherTable VALUES (2)
INSERT #OtherTable VALUES (3)
SELECT *
FROM #MyTable m
JOIN #OtherTable o ON CAST(m.MyCol AS BIGINT) = o.Id
WHERE m.MyCol NOT LIKE '%[^0-9]%'
Update:
The only way I can find to get it to work for having a WHERE clause for a specific integer value without doing another CAST() on the supposedly bigint column in the where clause too, is to use a user defined function:
CREATE FUNCTION [dbo].[fnBigIntRecordsOnly]()
RETURNS #Results TABLE (BigIntCol BIGINT)
AS
BEGIN
INSERT #Results
SELECT CAST(MyCol AS BIGINT)
FROM MyTable
WHERE MyCol NOT LIKE '%[^0-9]%'
RETURN
END
SELECT * FROM [dbo].[fnBigIntRecordsOnly]() WHERE BigIntCol = 1
I don't really think this is a great idea performance wise, but it's a solution
To answer your question about the error message: when you reference a view name in another query (assuming it's a traditional view not a materialised view), SQL Server effectively does a macro replacement of the view definition into the consuming query and then executes that.
The advantage of doing this is that the query optimiser can do a much better job if it sees the whole query, rather than optimising the view separately as a "black box".
A consequence is that if an error occurs, error descriptions may look confusing because the execution engine is accessing the underlying tables for the data, not the view.
I'm not sure how materialised views are treated, but I would imagine that they are treated like tables, since the view data is cached in the database.
Having said that, I agree with previous answers - you should re-think your table design and separate out the text and integer data values into separate columns.
Try changing your view to this :
SELECT TOP 100 PERCENT ID,
Cast(Case When IsNumeric(MyCol) = 1 Then MyCol Else null End AS bigint) AS MyCol
FROM MyTable
WHERE (IsNumeric(MyCol) = 1)
Have you tried to convert other table's bigint field into varchar? As for me it makes sense to perform more robust conversion... It shouldn't affect your performance too much if varchar field is indexed.
Consider creating a redundant bigint field to hold the integer value of af MyCol.
You may then index the new field to speed up the join.
Try using this:
SELECT
ID,
CAST(MyCol AS bigint) as MyCol
FROM
(
SELECT TOP (100) PERCENT
ID,
MyCol
FROM
MyTable
WHERE
(isnumeric(MyCol) = 1)
) as tmp
This should work since the inner select only return numeric values and the outer select can therefore convert all values from the first select into a numeric. It seems that in your own code SQL tries to cast before executing the isnumeric function (maybe it has something to do with optimizing).
Try doing the select in 2 stages.
first create a view that selects all columns where my col is nummeric.
Then do a select in that view where you cast the varchar field.
The other thing you could look at is your design of tables to remove the need for the cast.
EDIT
Are some of the numbers larger than bigint?
Are there any spaces, leading, trailing or in the number?
Are there any format characters? Decimal points?