TSQL BIT =0 and <>1 - sql

Are there any differences if I write
select * from table where bit_field = 0
or
select * from table where bit_field <> 1
?
UPDATE:
I've got a report that it is better to use "bit_field = 0" version because if you have some indexes on a field an use the second option there is some issue with "Seek Predicates"... It has two seek predicates one with < 1 and other with > 1... or something.
I don't have any example to show. :(
After the change from <> to =, there is a decrease of exclusive locks (X) and intent exclusive locks (IX)
Any thoughts on that?

There should be no differences between the two queries. If there are no NULL values present, then obviously any value which is zero is also the same thing as not being one. If there are NULLs present, it doesn't change anything, because a NULL record would not be returned from either query.
See the demo below.
Demo

There should be no difference, because a bit only takes on the values of 0 and 1.
Both will exclude NULL values.

Related

Select ' ' from TableA

What exactly does
select '' from TableA
do?
When I run it on a given table I get back a record for all rows in the table that are obviously blank with the header of '(No column name)' because no alias was used.
I have seen this query used as a subquery in 'not exists' statements.
At what times would this query be useful and is it a good practice to query this way?
For instance when I first saw it I thought it would return one blank row but in fact it returns all rows in the table and they are blank.
I've looked around and haven't found an answer for this.
Thank you
When checking whether something exists in a table, it is common to select an arbitrary value rather than an actual column, because it has an affect on the execution plan (if you select a real column, the execution plan takes that column into account and it can take a little longer, even though you don't use the column).
Most commonly, I have seen 1:
IF EXISTS (SELECT 1 FROM MyTable WHERE SomeColumn > 10)
If you just care whether there is any row, you can short circuit the query rather than getting all rows... although I suspect the EXISTS statement would stop as soon as any row was found anyway.
IF EXISTS (SELECT TOP 1 '' FROM TableA)
You would use this syntax if you want to add a static value as part of your query for any reason
E.g.
SELECT
'SELECT TOP 10 * FROM '+name
from sys.objects
where type = 'U'
This will create automatically create queries for all tables you have in your database

Query to compare 3 different data columns with a reference value

I need to check whether a table (in an Oracle DB) contains entries that were updated after a certain date. "Updated" in this case means any of 3 columns (DateCreated, DateModified, DateDeleted) have a value greater than the reference.
The query I have come up so far is this
select * from myTable
where DateCreated > :reference_date
or DateModified > :reference_date
or DateDeleted > :reference_date
;
This works and gives desired results, but is not what I want, because I would like to enter the value for :reference_date only once.
Any ideas on how I could write a more elegant query ?
While what you have looks fine and only uses one bind variable, if for some reason you have positional rather than named binds then you could avoid the need to supply the bind value multiple time by using an inline view or a CTE:
with cte as (select :reference_date as reference_date from dual)
select myTable.*
from cte
join myTable
on myTable.DateCreated > cte.reference_date
or myTable.DateModified > cte.reference_date
or myTable.DateDeleted > cte.reference_date
;
But again I wouldn't consider that better than your original unless you have a really compelling reason and a problem supplying the bind value. Having to set it three times from a calling program probably wouldn't count as compelling, for example, for me anyway. And I'd check it didn't affect performance before deploying - I'd expect Oracle to optimise something like this but the execution plan might be interesting.
I suppose you could rewrite that as:
select * from myTable
where greatest(DateCreated, DateModified, DateDeleted) > :reference_date;
if you absolutely had to, but I wouldn't. Your original query is, IMHO, much easier to understand than this one, plus by using a function, you've lost any chance of using an index, should one exist (unless you have a function based index based on the new clause).

Update Multiple Rows Using An Inequality

I'm trying to update one column in a subset of a table but I can't figure out how to do it in a clean and efficient manner.
Consider the following:
// MyTable
id name flag
0 Steve 0
1 Bob 0
...
10500 Rick 0
I want to change flag to 1 but only for some of the cases. I tried to use
UPDATE MyTable
SET flag = 1
WHERE id <= 500
But obviously that does not work because the subquery returns more than one value. Technically, I could do it like this:
UPDATE MyTable SET flag = 1 WHERE id = 0
UPDATE MyTable SET flag = 1 WHERE id = 1
...
UPDATE MyTable SET flag = 1 WHERE id = 500
But who wants to do it like that? :) Is there a better way for me to format this query and only update those which match an inequality?
EDIT
To clarify exactly what's going on: when I say 'some of these cases' I only mean those which match the inequality, in this case id <= 500
When I run UPDATE MyTable SET flag = 1 WHERE id <= 500 I get the following error:
Subquery returned more than 1 value.
This is not permitted when the subquery follows =, !=, <, <= , >, >=
or when the subquery is used as an expression.
SInce your query does not have a subquery, I would suspect that you have a poorly wrtten trigger on the table that expects only one record at a time to be updated. This needs to be fixed as no trigger should ever be written on this assumption. Triggers in SQL Server need to perform only set-based operations as they work against the whole set not one row at a time.
This is one of the primary use cases for conditional queries (use of a where clause). You look for a way to uniquely identify the rows rows you wish to change and construct a query based on the identifying information. Your initial attempt applies this concept, however, your table as constructed is so general that composing an adequate query to select out only the relevant information is difficult if not impossible. Perhaps try changing the table to make the data more specific preferably by partitioning the data. For example is there some way of grouping the data so that some people can be identified as "students" or "professors" etc? What actually is flag supposed to indicate?
The query you have should work, if it is only that query, it should return the number of rows effected.
The long hand you wrote does exactly what your first query does. What environment are you running it in?

SQL find non-null columns

I have a table of time-series data of which I need to find all columns that contain at least one non-null value within a given time period. So far I am using the following query:
select max(field1),max(field2),max(field3),...
from series where t_stamp between x and y
Afterwards I check each field of the result if it contains a non-null value.
The table has around 70 columns and a time period can contain >100k entries.
I wonder if there if there is a faster way to do this (using only standard sql).
EDIT:
Unfortunately, refactoring the table design is not an option for me.
The EXISTS operation may be faster since it can stop searching as soon as it finds any row that matches the criteria (vs. the MAX which you are using). It depends on your data and how smart your SQL server is. If most of your columns have a high rate of non-null data then this method will find rows quickly and it should run quickly. If your columns are mostly NULL values then your method may be faster. I would give them both a shot and see how they are each optimized and how they run. Also keep in mind that performance may change over time if the distribution of your data changes significantly.
Also, I've only tested this on MS SQL Server. I haven't had to code strict ANSI compatible SQL in over a year, so I'm not sure that this is completely generic.
SELECT
CASE WHEN EXISTS (SELECT * FROM Series WHERE t_stamp BETWEEN #x AND #y AND field1 IS NOT NULL) THEN 1 ELSE 0 END AS field1,
CASE WHEN EXISTS (SELECT * FROM Series WHERE t_stamp BETWEEN #x AND #y AND field2 IS NOT NULL) THEN 1 ELSE 0 END AS field2,
...
EDIT: Just to clarify, the MAX method might be faster since it could determine those values with a single pass through the data. Theoretically, the method here could as well, and potentially with less than a full pass, but your optimizer may not recognize that all of the subqueries are related, so it might do separate passes for each. That still might be faster, but as I said it depends on your data.
It would be faster with a different table design:
create table series (fieldno integer, t_stamp date);
select distinct fieldno from series where t_stamp between x and y;
Having a table with 70 "similar" fields is not generally a good idea.
When you say "a faster way to do this", if you mean a faster way for the query to run, then yes, here's how to do it: break it out into one query per column:
select top 1 field1 from series where t_stamp between x and y and field1 is not null
select top 1 field2 from series where t_stamp between x and y and field2 is not null
select top 1 field3 from series where t_stamp between x and y and field3 is not null
This way, you won't be doing a table scan across the entire table to find the maximum value. Instead, the database engine will stop as soon as it finds a non-null value. Assuming your data isn't 99% nulls, this should give you faster execution - but at the expense of more programming time to set this up.
How about this... You query for a list of field names that you can iterate through.
select 'field1' as fieldname from series
where field1 is not null and t_stamp between x and y
UNION
select 'field2' from series where field2 is not null
... etc
Then you have a recordset that will only contain the string name of the fields that are not null. Then you can loop over this recordset to build your real query as dynamic SQL and ignore fields that don't have any data. The "select 'field2'" will not return a string when there is no crieteria matching the where clause.
Edit: I think I misread the question... this will give you all the rows with a non-null value. I'll leave it here in case it helps someone but it's not the answer to your question. Thanks #Pax
I think you want to use COALESCE:
SELECT ... WHERE COALESCE(fild1, field2, field3) IS NOT NULL
For a start, this is a very bad idea with standard SQL since not all DBMSs sort with NULLs last.
There are all sorts of tricky ways you could do this and most would be interminably slow.
I'd suggest you (sort-of) normalize the database some more so that each of the columns is in a separate table which would make a select easier but that's probably not what you want.
After edit of question: if refactoring table design is not an option, your given solution is probably the best, especially if you have indexes on all the 70 columns.
Although that's likely to slow down inserts quite a bit, you may want to use a non-indexed table for maximum insert speed and transfer the data periodically (overnight?) to an indexed table which would run your selects at best speed (by avoiding a full table scan).
select count(field1),count(field2),count(field3),...
from series where t_stamp between x and y
will tell you how many non-null values are in each column. Unfortunately, it's not much better than the way you're doing it now.
Try this:
SELECT CASE WHEN field1 IS NOT NULL THEN '' ELSE 'contains null' END AS field1_stat,
CASE WHEN field2 IS NOT NULL THEN '' ELSE 'contains null' END AS field2_stat,
... for every field to be checked
FROM series
WHERE foo IN bar
GROUP BY CASE WHEN field1 IS NOT NULL THEN '' ELSE 'contains null' END,
CASE WHEN field2 IS NOT NULL THEN '' ELSE 'contains null' END
... etc
This will give you a summary on the combination of 'nulled' fields in the table

What's the most efficient way to check the presence of a row in a table?

Say I want to check if a record in a MySQL table exists. I'd run a query, check the number of rows returned. If 0 rows do this, otherwise do that.
SELECT * FROM table WHERE id=5
SELECT id FROM table WHERE id=5
Is there any difference at all between these two queries? Is effort spent in returning every column, or is effort spent in filtering out the columns we don't care about?
SELECT COUNT(*) FROM table WHERE id=5
Is a whole new question. Would the server grab all the values and then count the values (harder than usual), or would it not bother grabbing anything and just increment a variable each time it finds a match (easier than usual)?
I think I'm making a lot of false assumptions about how MySQL works, but that's the meat of the question! Where am I wrong? Educate me, Stack Overflow!
Optimizers are pretty smart (generally). They typically only grab what they need so I'd go with:
SELECT COUNT(1) FROM mytable WHERE id = 5
The most explicit way would be
SELECT WHEN EXISTS (SELECT 1 FROM table WHERE id = 5) THEN 1 ELSE 0 END
If there is an index on (or starting with) id, it will only search, with maximum efficiency, for the first entry in the index it can find with that value. It won't read the record.
If you SELECT COUNT(*) (or COUNT anything else) it will, under the same circumstances, count the index entries, but not read the records.
If you SELECT *, it will read all the records.
Limit your results to at most one row by appending LIMIT 1, if all you want to do is check the presence of a record.
SELECT id FROM table WHERE id=5 LIMIT 1
This will definitely ensure that no more than one row is returned or processed. In my experience, LIMIT 1 (or TOP 1 depending in the DB) to check for existence of a row makes a big difference in terms of performance for large tables.
EDIT: I think I misread your question, but I'll leave my answer here anyway if it's of any help.
I would think this
SELECT null FROM table WHERE id = 5 LIMIT 1;
would be faster than this
SELECT 1 FROM table WHERE id = 5 LIMIT 1;
but the timer says the winner is "SELECT 1".
For the first two queries, most people will generally say, always specify exactly what you need and leave the rest. Effort isn't all specific as bandwidth could be spent in returning data that you aren't even going to do anything with.
As for the previous answer will do for your result set, unless you're dealing with a language that supports affected rows. This can sometimes work when getting data to collect information on how many rows were returned in the last query. You'll need to look at your interface documentation as to how to get that information.
The difference between your 3 queries depends on how you've built your index. Only returning the primary key is likely to be faster as MySQL will have your index in memory, and not have to hit disk. Adding the LIMIT 1 is also a good trick that will speed up the optimizer significantly in early 5.0.x branches and earlier.
try EXPLAIN SELECT id FROM table WHERE id=5 and check the Extras column for the presence of USING INDEX. If its there, then you're query is coming straight from the index, and is going to be much faster.