T-SQL Stored Procedure: Performance of select count(*) vs. select count([uniqueId]) - sql

So, I'm looking at a stored procedure here, which has more than one line like the following pseudocode:
if(select count(*) > 0)
...
on tables having a unique id (or identifier, for making it more general).
Now, in terms of performance, is it more performant to change this clause
to
if(select count([uniqueId]) > 0)
...
where uniqueId is, e.g., an Idx containing double values?
An example:
Consider a table like Idx (double) | Name (String) | Address (String)
Now the 'Idx' is a foreign key which I want to join in a stored procedure.
So, in terms of performance: what is better here?
if(select count(*) > 0)
...
or
if(select count(Idx) > 0)
...
? Or does the SQL Engine Change select count(*) to select count(Idx) internally, so we do not have to bother about this? Because at first sight, I'd say that select count(Idx) would be more performant.

The two are slightly different. count(*) counts rows. count([uniqueid]) counts the number of non-NULL values for uniqueid. Because a unique constraint allows a NULL value, SQL Server actually needs to read the column. This could add microseconds of time to a query, particularly if the page with the id is not already in memory. This also gives SQL Server more opportunities to optimize count(*).
As #lad2025 writes in a comment, the performant solution is to use if (exists . . ..

SELECT t1.*
FROM Table1 t1
JOIN Table2 t2 ON t2.idx = t1.idx
will give you only the rows in t1 that match an idx value in Table2. I'm not sure there is a good reason to do an if(select count...).
If you are really interested in the performance of something like this, just create a temp table with a million rows and give it a go:
CREATE TABLE #TempTable (id int identity, txt varchar(50))
GO
INSERT #TempTable (txt) VALUES (##IDENTITY)
GO 1000000

Related

Optimized way to check if record is present in table 1. If not then check table 2, else return default value

Asked in an interview:
I have 2 tables, one table has records like ID, Name, address. id(pk) is from 1 to 10000000.
Another table has records from 10000001 to 20000000.
I have to check if a particular ID is present in table 1 or table 2 and return corresponding result.
Because table size is big, have to think an optimized way to do this.
declare #ID BIGINT
SET #ID=10000000
IF EXIST(SELECT ID FROM TABLE1 WHERE ID=#ID)
SELECT ID,NAME,ADDRESS FROM TABLE1 WHERE ID=#ID
ELSE IF EXIST(SELECT ID FROM TABLE2 WHERE ID=#ID)
SELECT ID,NAME,ADDRESS FROM TABLE2 WHERE ID=#ID
ELSE
SELECT #ID
Few ideas on top of my mind.
In the hive, you can use map-side join which is much faster than usual join when 1 table is large and another is small. (here 2nd table being the id you are searching for)
You can optimize in the way you store the data. Keeping the data sorted by id column, if such queries are frequent. A columnar format such as orc keeps track of the range of id in each file, resulting in such queries being faster.

'In' clause in SQL server with multiple columns

I have a component that retrieves data from database based on the keys provided.
However I want my java application to get all the data for all keys in a single database hit to fasten up things.
I can use 'in' clause when I have only one key.
While working on more than one key I can use below query in oracle
SELECT * FROM <table_name>
where (value_type,CODE1) IN (('I','COMM'),('I','CORE'));
which is similar to writing
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'COMM'
and
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'CORE'
together
However, this concept of using 'in' clause as above is giving below error in 'SQL server'
ERROR:An expression of non-boolean type specified in a context where a condition is expected, near ','.
Please let know if their is any way to achieve the same in SQL server.
This syntax doesn't exist in SQL Server. Use a combination of And and Or.
SELECT *
FROM <table_name>
WHERE
(value_type = 1 and CODE1 = 'COMM')
OR (value_type = 1 and CODE1 = 'CORE')
(In this case, you could make it shorter, because value_type is compared to the same value in both combinations. I just wanted to show the pattern that works like IN in oracle with multiple fields.)
When using IN with a subquery, you need to rephrase it like this:
Oracle:
SELECT *
FROM foo
WHERE
(value_type, CODE1) IN (
SELECT type, code
FROM bar
WHERE <some conditions>)
SQL Server:
SELECT *
FROM foo
WHERE
EXISTS (
SELECT *
FROM bar
WHERE <some conditions>
AND foo.type_code = bar.type
AND foo.CODE1 = bar.code)
There are other ways to do it, depending on the case, like inner joins and the like.
If you have under 1000 tuples you want to check against and you're using SQL Server 2008+, you can use a table values constructor, and perform a join against it. You can only specify up to 1000 rows in a table values constructor, hence the 1000 tuple limitation. Here's how it would look in your situation:
SELECT <table_name>.* FROM <table_name>
JOIN ( VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b) ON a = value_type AND b = CODE1;
This is only a good idea if your list of values is going to be unique, otherwise you'll get duplicate values. I'm not sure how the performance of this compares to using many ANDs and ORs, but the SQL query is at least much cleaner to look at, in my opinion.
You can also write this to use EXIST instead of JOIN. That may have different performance characteristics and it will avoid the problem of producing duplicate results if your values aren't unique. It may be worth trying both EXIST and JOIN on your use case to see what's a better fit. Here's how EXIST would look,
SELECT * FROM <table_name>
WHERE EXISTS (
SELECT 1
FROM (
VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b)
WHERE a = value_type AND b = CODE1
);
In conclusion, I think the best choice is to create a temporary table and query against that. But sometimes that's not possible, e.g. your user lacks the permission to create temporary tables, and then using a table values constructor may be your best choice. Use EXIST or JOIN, depending on which gives you better performance on your database.
Normally you can not do it, but can use the following technique.
SELECT * FROM <table_name>
where (value_type+'/'+CODE1) IN (('I'+'/'+'COMM'),('I'+'/'+'CORE'));
A better solution is to avoid hardcoding your values and put then in a temporary or persistent table:
CREATE TABLE #t (ValueType VARCHAR(16), Code VARCHAR(16))
INSERT INTO #t VALUES ('I','COMM'),('I','CORE')
SELECT DT. *
FROM <table_name> DT
JOIN #t T ON T.ValueType = DT.ValueType AND T.Code = DT.Code
Thus, you avoid storing data in your code (persistent table version) and allow to easily modify the filters (without changing the code).
I think you can try this, combine and and or at the same time.
SELECT
*
FROM
<table_name>
WHERE
value_type = 1
AND (CODE1 = 'COMM' OR CODE1 = 'CORE')
What you can do is 'join' the columns as a string, and pass your values also combined as strings.
where (cast(column1 as text) ||','|| cast(column2 as text)) in (?1)
The other way is to do multiple ands and ors.
I had a similar problem in MS SQL, but a little different. Maybe it will help somebody in futere, in my case i found this solution (not full code, just example):
SELECT Table1.Campaign
,Table1.Coupon
FROM [CRM].[dbo].[Coupons] AS Table1
INNER JOIN [CRM].[dbo].[Coupons] AS Table2 ON Table1.Campaign = Table2.Campaign AND Table1.Coupon = Table2.Coupon
WHERE Table1.Coupon IN ('0000000001', '0000000002') AND Table2.Campaign IN ('XXX000000001', 'XYX000000001')
Of cource on Coupon and Campaign in table i have index for fast search.
Compute it in MS Sql
SELECT * FROM <table_name>
where value_type + '|' + CODE1 IN ('I|COMM', 'I|CORE');

Inner-Join on two column where one column has a single tailing character

Hi I'm new to SQL and I have 2 tables that I am trying to do an inner-join with.
------------------------
First table:
------------------------
ID-Number CustomerName
------------------------
Second table
------------------------
ID-Number CustomerDevice
(ID with a single tailing character)
Questions
What would be the best preforming way to execute the inner-join on both table's ID-number?
Is there a method to remove the trailing character within the inner-join command?
You don't have much choice. Here is how you can express the logic:
select . . .
from t1 join
t2
on t1.id like t2.id + '_';
Unfortunately, this may not make use of indexes. (Also note that + for string concatenation is SQL Server-specific).
You might be able to rewrite the query as:
on t1.id = left(t2.id, len(t2.id) - 1)
This should be able to use an index on t1(id).
The best approach is to fix the data, so your ids are the same type, same length, and have a properly declared foreign key relationship. Another alternative available in SQL Server is an index on a computed column:
alter table t2 add realId as (left(id, len(id) - 1));
create index idx_t2_realId on t2(realId);
Then write the join logic using realId.
Would this work?
SELECT
ID-Number,
CustomerName,
CustomerDevice
FROM t1
INNER JOIN t2 on t1.ID-Number=LEFT(t2.ID-Number,LEN(t2.ID-Number)-1)
EDIT: Forgot the 1
Given that the table Customer has this column
ID_number int not null;
And the the table Device has this column
ID_number varchar(15);
And we know that Device.ID_number, if it is not NULL, is always equal to some Customer.ID_number with a letter appended, then (SQL Server):
SELECT *
FROM Customer c
JOIN Device d
ON c.ID_number = CAST(SUBSTRING(i.ID_number, 1, LEN(i.ID_number) - 1) AS int)
More robust solutions that allow for more possibilities in the data require more defensive coding. You may want to define a scalar function to process Customer.ID_number.

Check whether a table contains rows or not sql server 2005

How to Check whether a table contains rows or not sql server 2005?
For what purpose?
Quickest for an IF would be IF EXISTS (SELECT * FROM Table)...
For a result set, SELECT TOP 1 1 FROM Table returns either zero or one rows
For exactly one row with a count (0 or non-zero), SELECT COUNT(*) FROM Table
Also, you can use exists
select case when exists (select 1 from table)
then 'contains rows'
else 'doesnt contain rows'
end
or to check if there are child rows for a particular record :
select * from Table t1
where exists(
select 1 from ChildTable t2
where t1.id = t2.parentid)
or in a procedure
if exists(select 1 from table)
begin
-- do stuff
end
Like Other said you can use something like that:
IF NOT EXISTS (SELECT 1 FROM Table)
BEGIN
--Do Something
END
ELSE
BEGIN
--Do Another Thing
END
FOR the best performance, use specific column name instead of * - for example:
SELECT TOP 1 <columnName>
FROM <tableName>
This is optimal because, instead of returning the whole list of columns, it is returning just one. That can save some time.
Also, returning just first row if there are any values, makes it even faster. Actually you got just one value as the result - if there are any rows, or no value if there is no rows.
If you use the table in distributed manner, which is most probably the case, than transporting just one value from the server to the client is much faster.
You also should choose wisely among all the columns to get data from a column which can take as less resource as possible.
Can't you just count the rows using select count(*) from table (or an indexed column instead of * if speed is important)?
If not then maybe this article can point you in the right direction.
Fast:
SELECT TOP (1) CASE
WHEN **NOT_NULL_COLUMN** IS NULL
THEN 'empty table'
ELSE 'not empty table'
END AS info
FROM **TABLE_NAME**

Looking for SQL constraint: SELECT COUNT(*) from tBoss < 2

I'd like to limit the entries in a table. Let's say in table tBoss. Is there a SQL constraint that checks how many tuples are currently in the table? Like
SELECT COUNT(*) from tBoss < 2
Firebird says:
Invalid token.
Dynamic SQL Error.
SQL error code = -104.
Token unknown - line 3, column 8.
SELECT.
You could do this with a check constraint and a scalar function. Here's how I built a sample.
First, create a table:
CREATE TABLE MyTable
(
MyTableId int not null identity(1,1)
,MyName varchar(100) not null
)
Then create a function for that table. (You could maybe add the row count limit as a parameters if you want more flexibility.)
CREATE FUNCTION dbo.MyTableRowCount()
RETURNS int
AS
BEGIN
DECLARE #HowMany int
SELECT #HowMany = count(*)
from MyTable
RETURN #HowMany
END
Now add a check constraint using this function to the table
ALTER TABLE MyTable
add constraint CK_MyTable__TwoRowsMax
check (dbo.MyTableRowCount() < 3)
And test it:
INSERT MyTable (MyName) values ('Row one')
INSERT MyTable (MyName) values ('Row two')
INSERT MyTable (MyName) values ('Row three')
INSERT MyTable (MyName) values ('Row four')
A disadvantage is that every time you insert to the table, you have to run the function and perform a table scan... but so what, the table (with clustered index) occupies two pages max. The real disadvantage is that it looks kind of goofy... but everything looks goofy when you don't understand why it has to be that way.
(The trigger solution would work, but I like to avoid triggers whenever possible.)
Does your database have triggers? If so, Add a trigger that rolls back any insert that would add more than 2 rows...
Create Trigger MyTrigName
For Insert On tBoss
As
If (Select Count(*) From tBoss) > 2
RollBack Transaction
but to answer your question directly, the predicate you want is to just put the select subquery inside parentheses. like this ...
[First part of sql statement ]
Where (SELECT COUNT(*) from tBoss) < 2
To find multiples in a database your best bet is a sub-query for example: (Note I am assuming you are looking to find duplicated rows of some sort)
SELECT id FROM tBoss WHERE id IN ( SELECT id FROM tBoss GROUP BY id HAVING count(*) > 1 )
where id is the possibly duplicated column
SELECT COUNT(*) FROM tBoss WHERE someField < 2 GROUP BY someUniqueField