I have a situation where I am passing a bit datatype into a COALESCE. Then setting that = 0 to check if it equals 0. The problem is, that is not working!
(Please note that I am not authorized to change the datatype of any columns)
This is what I have:
SELECT Meters.ID, Consumption, Charge
FROM Data (nolock)
JOIN Meters
ON Data.MeterID = Meters.ID
WHERE COALESCE(Processed, 0) = 0
The idea behind Processed is that, if the data were processed than it should 1 so I do not want to process them again.
//Processed is a column in the Data table that is type bit. My Joins are 100% correct cause I ran them without the Where and execute with no problems, the problem is when I add that where! Even though that Processed column has 1, 0, NULL values.... It does not return anything! Can anybody suggest a solution? Thank you.
where processed is null or processed = 0
is the logical equivalent expresion, I don't figue why COALESCE is not working
Related
I have a simple select that is running very slow and have narrowed it down to one particular where statement.
I am not sure if you need to see the whole query, or maybe will be able to help me understand why the case is affecting performance so much. I feel like I found the problem, but can't seem to resolve it. I've worked with case statement before, and have never ran into such huge performance issues.
For this particular example. the declaration is as follows: Declare #lastInvOnly as int = 0
the problem where statement follow and runs for about 20 seconds:
AND ird.inventorydate = CASE WHEN #lastinvonly=0 THEN
-- get the last reported inventory in respect to the specified parameter
(SELECT MAX(ird2.inventorydate)
FROM irdate ird2
WHERE ird2.ris =r.ris AND
ird2.generateddata!='g' AND
ird2.inventorydate <= #inventorydate)
END
Removing the case makes it run in 1 second which is a HUGE difference. I can't understand why.
AND ird.inventorydate =
(SELECT MAX(ird2.inventorydate)
FROM irdate ird2
WHERE ird2.ris = r.ris AND
ird2.generateddata! = 'g' AND
ird2.inventorydate <= #inventorydate)
It should almost certainly be a derived table and you should join to it instead. Sub-selects tend to have poor performance and when used conditionally, even worse. Try this instead:
INNER JOIN (
select
ris
,max(inventorydate) AS [MaxInvDate]
from irdate
where s and generateddata!='g'
and inventorydate <= #inventorydate
GROUP BY ris
) AS MaxInvDate ON MaxInvDate.ris=r.ris
and ird.inventorydate=MaxInvDate.MaxInvDate
and #lastinvonly=0
I'm not 100% positive this logically works with the whole query as your question only provides a small part.
I can't tell for sure without seeing an execution plan but the branch in your filter is likely the cause of the performance problems. Theoretically, the optimizer can take the version without the case and apply an optimization that transforms the subquery in your filter into a join; when the case statement is added this optimization is no longer possible and the subquery is executed for every row. One can refactor the code to help the optimizer out, something like this should work:
outer apply (
select max(ird2.inventorydate) as maxinventorydate
from irdate ird2
where ird2.ris = r.ris
and ird2.generateddata <> 'g'
and ird2.inventorydate <= #inventorydate
and #lastinvonly = 0
) as ird2
where ird.inventorydate = ird2.maxinventorydate
I have database that contains data points from various sensors. These data points are taken twice a minute.
I am using the following SQL query to attempt to get two separate data points at two different times.
SELECT *
FROM TableName
WHERE TagName = 'TagName'
AND ("DateTime" = 'X' OR "DateTime" = 'Y')
I see no reason why this should not return 2 data points, one for the first date, and one for the second, but for some reason the query only returns the row for Y.
I feel like I am missing something extremely obvious.
For a bit more context, this query is used in conjuction with a python script to grab data between two dates using a specific resolution.
Any ideas?
EDIT: I believe it has something to do with the structure of the database and how it operates in terms of giving entries a time value.
It needs more investigation on my part so im going to flag this question for deletion, thanks all for you help
Not sure if that is a mistype in your question, but if not, it is probably the " after the AND part of the statement:
AND ("DateTime" = 'X' OR "DateTime" = 'Y')
It should be AND (DateTime = 'X' OR DateTime = 'Y'). Notice no quotes.
http://sqlfiddle.com/#!2/ed00ba/3
I have this piece of code which I am not sure how would it work:
UPDATE Data
SET Processed = 1
FROM Data
JOIN Meters
ON Meters.ServiceAccount = serv_acct
where COALESCE(Processed, 0) = 0
My question is about the last line! Would that line ever be true in this case?
Since I am setting Processed to 1 then how would that work:
where COALESCE(Processed, 0) = 0?
Can anybody explain the logic of using Coalesce in this way?
This code is not written by me.
Thank you
Your query is:
UPDATE Data
SET Processed = 1
FROM Data JOIN
Meters
ON Meters.ServiceAccount = serv_acct
where COALESCE(Processed, 0) = 0;
An update query determines the population of rows it is acting on before any of the changes are made. So, the final line is taking rows where Processed is either NULL or 0. The update is then setting Processed to 1 for those rows. In other words, the where clause is acting as a filter on the rows to modify. The specific statement is to keep the rows where the value of Processed is NULL or 0.
The COALESCE function is described here:
http://technet.microsoft.com/en-us/library/ms190349.aspx
I think the reason behind using this predicate where COALESCE(Processed, 0) = 0 was to filter all rows which have column Processed IS NULL or equal to 0.
Instead, I would use use predicates:
UPDATE Data
SET Processed = 1
FROM Data JOIN
Meters
ON Meters.ServiceAccount = serv_acct
where Processed IS NULL OR Processed = 0;
because they are SARGable. This means Index Seek.
Applying an expression on Processed column will force SQL Server to choose an [Clustered] Index Scan.
I wrote the following query:
UPDATE king_in
SET IN_PNSN_ALL_TP_CNTRCT_CD = IN_PNSN_ALL_TP_CNTRCT_CD + '3'
WHERE COALESCE(IN_PNSN_ALL_TP_CNTRCT_TX, '') <> ''
AND CHARINDEX('3', IN_PNSN_ALL_TP_CNTRCT_CD) = 0
It checks to see if a field has a value in it and if it does it puts a 3 in a corresponding field if there isn't a 3 already in it. When I ran it, I got a string or binary data will be truncated error. The field is a VARCHAR(3) and there are rows in the table that already have 3 characters in them but the rows that I was actually doing the updating on via the WHERE filter had a MAX LEN of 2 so I was completely baffled as to why SQL Server was throwing me the truncation error. So I changed my UPDATE statement to:
UPDATE king_in
SET IN_PNSN_ALL_TP_CNTRCT_CD = k.IN_PNSN_ALL_TP_CNTRCT_CD + '3'
FROM king_in k
INNER JOIN
(
SELECT ki.row_key,
in_sqnc_nb
FROM king_in ki
INNER JOIN King_Ma km
ON ki.Row_Key = km.Row_Key
INNER JOIN King_Recs kr
ON km.doc_loc_nb = kr.ACK_ID
WHERE CHARINDEX('3', IN_PNSN_ALL_TP_CNTRCT_CD) = 0
AND COALESCE(IN_PNSN_ALL_TP_CNTRCT_TX, '') <> ''
) a
ON k.Row_Key = a.Row_Key
AND k.in_sqnc_nb = a.insr_sqnc_nb
and it works fine without error.
So it appears based on this that when doing an UPDATE statement without a FROM clause that SQL Server internally goes through and runs the SET statement before it filters the records based on the WHERE clause. Thats why I was getting the truncation error, because even though the records I wanted to update were less than 3 characters, there were rows in the table that had 3 characters in that field and when it couldn't add a '3' to the end of one of those rows, it threw the error.
So after all of that, I've got a handful of questions.
1) Why? Is there a specific DBMS reason that SQL Server wouldn't filter the result set before applying the SET statement?
2) Is this just a known thing about SQL that I never learned along the way?
3) Is there a setting in SQL Server to change this behavior?
Thanks in advance.
1 - Likely because your criteria are not SARGable - that is, they can't use an index. If the query optimizer determines it's faster to do a table scan, it'll go ahead and run on all the rows. This is especially likely when you filter on a function applied to the field like you do here.
2 - Yes. The optimizer will do what it thinks it best. You can get around this somewhat by using parentheses to force an evaluation order of your WHERE clause but in your example I don't think it would help since it forces a table scan regardless.
3 - No, you need to alter your data or your logic to allow indexes to be used. If you really really need to filter on existence of a certain character in a field, it probably should be it's own column and/or you should normalize that particular bit of data better.
A workaround for your particular instance would be to add a WHERE LEN(IN_PNSN_ALL_TP_CNTRCT_CD) < 3 as well.
I'm just wondering what is faster in SQL (specifically SQL Server).
I could have a nullable column of type Date and compare that to NULL, or I could have a non-nullable Date column and a separate bit column, and compare the bit column to 1/0.
Is the comparison to the bit column going to be faster?
In order to check that a column IS NULL SQL Server would actually just check a bit anyway. There is a NULL BITMAP stored for each row indicating whether each column contains a NULL or not.
I just did a simple test for this:
DECLARE #d DATETIME
,#b BIT = 0
SELECT 1
WHERE #d IS NULL
SELECT 2
WHERE #b = 0
The actual execution plan results show the computation as exactly the same cost relative to the batch.
Maybe someone can tear this apart, but to me it seems there's no difference.
MORE TESTS
SET DATEFORMAT ymd;
CREATE TABLE #datenulltest
(
dteDate datetime NULL
)
CREATE TABLE #datebittest
(
dteDate datetime NOT NULL,
bitNull bit DEFAULT (1)
)
INSERT INTO #datenulltest ( dteDate )
SELECT CASE WHEN CONVERT(bit, number % 2) = 1 THEN '2010-08-18' ELSE NULL END
FROM master..spt_values
INSERT INTO #datebittest ( dteDate, bitNull )
SELECT '2010-08-18', CASE WHEN CONVERT(bit, number % 2) = 1 THEN 0 ELSE 1 END
FROM master..spt_values
SELECT 1
FROM #datenulltest
WHERE dteDate IS NULL
SELECT 2
FROM #datebittest
WHERE bitNull = CONVERT(bit, 1)
DROP TABLE #datenulltest
DROP TABLE #datebittest
dteDate IS NULL result:
bitNull = 1 result:
OK, so this extended test comes up with the same responses again.
We could do this all day - it would take some very complex query to find out which is faster on average.
All other things being equal, I would say the Bit would be faster because it is a "smaller" data type. However, if performance is very important here (and I assume it is because of the question) then you should always do testing, as there may be other factors such as indexes, caching that affect this.
It sounds like you are trying to decide on a datatype for field which will record whether an event X has happened or not. So, either a timestamp (when X happened) or just a Bit (1 if X happened, otherwise 0). In this case I would be tempted to go for the Date as it gives you more information (not only whether X happened, but also exactly when) which will most likely be useful in the future for reporting purposes. Only go against this if the minor performance gain really is more important.
Short answer, If you have only 1s and 0s something like bit-map index 1,0 is uber fast. Nulls are not indexed on certain sqlengines so 'is null' and 'not null' are slow. However, do think of the entity semantics before dishing this out. It is always better to have a semantic table definition, if you know what I mean.
The speed comes from ability to use indices and not from data size in this case.
Edit
Please refer to Martin Smith's answer. That makes more sense for sqlserver, I got carried away by oracle DB, my mistake here.
The bit will be faster as loading th bit to memory will load only 1 byte and loading the date will take 8 bytes. The comparison itself will take the same time, but the loading from the disk will take more time. Unless you use a very old server or need to load more then 10^8 rows you won't notice anything.