SQL Server features/commands that most developers are unaware of [duplicate] - sql

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Hidden Features of SQL Server
I've worked as a .NET developer for a while now, but predominantly against a SQL Server database for a little over 3 years now. I feel that I have a fairly decent grasp of SQL Server from a development standpoint, but I ashamed to admit that I just learned today about "WITH TIES" from this answer - Top 5 with most friends.
It is humbling to see questions and answers like this on SO because it helps me realize that I really don't know as much as I think I do and helps re-energize my will to learn more, so I figured what better way than to ask the masses of experts for input on other handy commands/features.
What is the most useful feature/command that the average developer is probably unaware of?
BTW - if you are like I was and don't know what "WITH TIES" is for, here is a good explanation. You'll see quickly why I was ashamed I was unaware of it. I could see where it could be useful though. - http://harriyott.com/2007/06/with-ties-sql-server-tip.aspx
I realize that this is a subjective question so please allow for at least a few answers before you close it. :) I'll try to edit my question to keep up a list with your response. Thanks
[EDIT] - Here is a summary of the responses Please scroll down for more information. Thanks again guys/gals.
MERGE - A single command to INSERT / UPDATE / DELETE into a table from a row source.
FILESTREAM feature of SQL Server 2008 allows storage of and efficient access to BLOB data using a combination of SQL Server 2008 and the NTFS file system
CAST - get a date without a time portion
Group By - I gotta say you should definitely know this already
SQL Server Management Studio
Transactions
The sharing of local scope temp tables between nested procedure calls
INSERT INTO
MSDN
JOINS
PIVOT and UNPIVOT
WITH(FORCESEEK) - forces the query optimizer to use only an index seek operation as the access path to the data in the table.
FOR XML
COALESCE
How to shrink the database and log files
Information_Schema
SET IMPLICIT_TRANSACTIONS in Management Studio 2005
Derived tables and common table expressions (CTEs)
OUTPUT clause - allows access to the "virtual" tables called inserted and deleted (like in triggers)
CTRL + 0 to insert null
Spacial Data in SQL Server 2008

FileStream in SQL Server 2008: FILESTREAM feature of SQL Server 2008 allows storage of and efficient access to BLOB data using a combination of SQL Server 2008 and the NTFS file system.
Creating a Table for Storing FILESTREAM Data
Once the database has a FILESTREAM filegroup, tables can be created that contain FILESTREAM columns. As mentioned earlier, a FILESTREAM column is defined as a varbinary (max) column that has the FILESTREAM attribute. The following code creates a table with a single FILESTREAM column
USE Production;
GO
CREATE TABLE DocumentStore (
DocumentID INT IDENTITY PRIMARY KEY,
Document VARBINARY (MAX) FILESTREAM NULL,
DocGUID UNIQUEIDENTIFIER NOT NULL ROWGUIDCOL
UNIQUE DEFAULT NEWID ())
FILESTREAM_ON FileStreamGroup1;
GO

In SQL Server 2008 (and in Oracle 10g): MERGE.
A single command to INSERT / UPDATE / DELETE into a table from a row source.
To generate a list of numbers from 1 to 31 (say, for a calendary):
WITH cal AS
(
SELECT 1 AS day
UNION ALL
SELECT day + 1
FROM cal
WHERE day <= 30
)
A single-column index with DESC clause in a clustered table can be used for sorting on column DESC, cluster_key ASC:
CREATE INDEX ix_column_desc ON mytable (column DESC)
SELECT TOP 10 *
FROM mytable
ORDER BY
column DESC, pk
-- Uses the index
SELECT TOP 10 *
FROM mytable
ORDER BY
column, pk
-- Doesn't use the index
CROSS APPLY and OUTER APPLY: enables to join rowsources which depend on the values of the tables being joined:
SELECT *
FROM mytable
CROSS APPLY
my_tvf(mytable.column1) tvf
SELECT *
FROM mytable
CROSS APPLY
(
SELECT TOP 5 *
FROM othertable
WHERE othertable.column2 = mytable.column1
) q
EXCEPT and INTERSECT operators: allow selecting conditions that include NULLs
DECLARE #var1 INT
DECLARE #var2 INT
DECLARE #var3 INT
SET #var1 = 1
SET #var2 = NULL
SET #var2 = NULL
SELECT col1, col2, col3
FROM mytable
INTERSECT
SELECT #val1, #val2, #val3
-- selects rows with `col1 = 1`, `col2 IS NULL` and `col3 IS NULL`
SELECT col1, col2, col3
FROM mytable
EXCEPT
SELECT #val1, #val2, #val3
-- selects all other rows
WITH ROLLUP clause: selects a grand total for all grouped rows
SELECT month, SUM(sale)
FROM mytable
GROUP BY
month WITH ROLLUP
Month SUM(sale)
--- ---
Jan 10,000
Feb 20,000
Mar 30,000
NULL 60,000 -- a total due to `WITH ROLLUP`

It's amazing how many people work unprotected with SQL Server as they don't know about transactions!
BEGIN TRAN
...
COMMIT / ROLLBACK

After creating a #TempTable in a procedure, it is available in all stored procedures that are then called from from the original procedure. It is a nice way to share set data between procedures. see: http://www.sommarskog.se/share_data.html

COALESCE() , it accepts fields and a value to use incase the fields are null.
For example if you have a table with city, State, Zipcode you can use COALESCE() to return the addresses as single strings, IE:
City | State | Zipcode
Houston | Texas | 77058
Beaumont | Texas | NULL
NULL | Ohio | NULL
if you were to run this query against the table:
select city + ‘ ‘ + COALESCE(State,’’)+ ‘ ‘+COALESCE(Zipcode, ‘’)
Would return:
Houston Texas 77058
Beaumont Texas
Ohio
You can also use it to pivot data, IE:
DECLARE #addresses VARCHAR(MAX)
SELECT #addresses = select city + ‘ ‘ + COALESCE(State,’’)+ ‘ ‘
+COALESCE(Zipcode, ‘’) + ‘,’ FROM tb_addresses
SELECT #addresses
Would return:
Houston Texas 77058, Beaumont Texas, Ohio

A lot of SQL Server developers still don't seem to know about the OUTPUT clause (SQL Server 2005 and newer) on the DELETE, INSERT and UPDATE statement.
It can be extremely useful to know which rows have been INSERTed, UPDATEd, or DELETEd, and the OUTPUT clause allows to do this very easily - it allows access to the "virtual" tables called inserted and deleted (like in triggers):
DELETE FROM (table)
OUTPUT deleted.ID, deleted.Description
WHERE (condition)
If you're inserting values into a table which has an INT IDENTITY primary key field, with the OUTPUT clause, you can get the inserted new ID right away:
INSERT INTO MyTable(Field1, Field2)
OUTPUT inserted.ID
VALUES (Value1, Value2)
And if you're updating, it can be extremely useful to know what changed - in this case, inserted represents the new values (after the UPDATE), while deleted refers to the old values before the UPDATE:
UPDATE (table)
SET field1 = value1, field2 = value2
OUTPUT inserted.ID, deleted.field1, inserted.field1
WHERE (condition)
If a lot of info will be returned, the output of OUTPUT can also be redirected to a temporary table or a table variable (OUTPUT INTO #myInfoTable).
Extremely useful - and very little known!
Marc

There are a handful of ways to get a date without a time portion; here's one that is quite performant:
SELECT CAST(FLOOR(CAST(getdate() AS FLOAT))AS DATETIME)
Indeed for SQL Server 2008:
SELECT CAST(getdate() AS DATE) AS TodaysDate

The "Information_Schema" gives me a lot of views that I can use to gather information about the SQL objects tables, procedures, views, etc.

If you are using Management Studio 2005 you can have it automatically execute your query as a transaction. In a new query window go to Query->Query Options. Then click on the ANSI "tab" (on the left). Check SET IMPLICIT_TRANSACTIONS. Click OK. Now if you run any query in this current query window it will run as a transaction and you must manually ROLLBACK or COMMIT it before continuing. Additionally, this only works for the current query window; pre-existing/new query windows will need to have the option set.
I've personally found it useful. However, it's not for the faint of heart. You must remember to ROLLBACK or COMMIT your query. It will NOT tell you that you have a pending transaction if you switch to a different query window (or even a new one). However, it will tell you if you try to close the query window.

PIVOT and UNPIVOT

FOR XML

BACKUP LOG <DB_NAME> WITH TRUNCATE_ONLY
DBCC_SHRINKDATABASE(<DB_LOG_NAME>, <DESIRED_SIZE>)
When I started to manage very large databases on MS SQL Server and the log file had over 300 GB this statements saved my life. In most cases the shrink database will have no effect.
Before running them be sure to make full backup of LOG, and after running them to do a full backup of DB (restore sequence is no longer valid).

Most SQL Server developers should know about and use derived tables and common table expressions (CTEs).

The documentation.
Sad to say, but I have come to the conclusion that the most hidden feature that developers are unaware of is the documentation on MSDN. Take for instance a Transact-SQL verb like RESTORE. The BOL will cover not only the syntax and arguments of RESTORE. But this is only the tip of the iceberg when it comes to documentation. The BOL covers:
the in depth fundamentals of recovery: Understanding How Restore and Recovery of Backups Work in SQL Server.
end-to-end scenarios on how to deploy a recovery strategy: Implementing Restore Scenarios for SQL Server Databases.
the issues around system databases: Considerations for Backing Up and Restoring System Databases.
optimizing the recovery procedures: Optimizing Backup and Restore Performance in SQL Server.
understanding how to to a restore. Backing Up and Restoring How-to Topics (Transact-SQL).
more corner cases and uncommon scenarios, there are examples like Example: Piecemeal Restore of Only Some Filegroups (Full Recovery Model).
The list goes on and on, and this is just one single topic (backup and restore). Every feature of SQL Server gets similar coverage. Reckon not everything will get the detail backup and recovery gets, but everything is documented and there are How To topics for every feature.
The amount of information available is just ludicrous. Yet the documentation is one of the most underused resources, hence my vote for it being a hidden feature.

How about materialised views? Add a clustered index to a view and you effectively create a table containing duplicate data that is automatically updated. Slows down inserts and updates because you are doing the operation twice but you make selecting a specific subset faster. And apparently the database optimiser uses it without you having to call it explicitly.
Is a view faster than a simple query?

It sounds silly to say but I've looked a lot of queries where I just asked myself does the person just not know what GROUP BY is? I'm not sure if most developers are unaware of it but it comes up enough that I wonder sometimes.

use ctrl-0 to insert a null value in a cell

WITH (FORCESEEK) which forces the query optimizer to use only an index seek operation as the access path to the data in the table.

Spacial Data in SQL Server 2008 i.e. storing Lat/Long data in a geography datatype and being able to calculate/query using the functions that go along with it.
It supports both Planar and Geodetic data.

Why am I tempted to say JOINS?
Derived tables are one of my favorites. They perform so much better than correlated subqueries but may people continue to use correlated subqueries instead.
Example of a derived table:
select f.FailureFieldName, f.RejectedValue, f.RejectionDate,
ft.FailureDescription, f.DataTableLocation, f.RecordIdentifierFieldName,
f.RecordIdentifier , fs.StatusDescription
from dataFailures f
join(select max (dataFlowinstanceid) as dataFlowinstanceid
from dataFailures
where dataflowid = 13)a
on f.dataFlowinstanceid = a.dataFlowinstanceid
join FailureType ft on f.FailureTypeID = ft.FailureTypeID
join FailureStatus fs on f.FailureStatusID = fs.FailureStatusID

When I first started working as programmer, I started with using SQL Server 2000. I had been taught DB theory on Oracle and MySQL so I didn't know much about SQL Server 2000.
But, as it turned out nor did the development staff I joined because they didn't know that you could convert datetime (and related) data types to formatted strings with built in functions. They were using a very inefficient custom function they had developed. I was more than happy to show them the errors of their ways... (I'm not with that company anymore... :-D)
With that annotate:
So I wanted to add this to the list:
select Convert(varchar, getdate(), 101) -- 08/06/2009
select Convert(varchar, getdate(), 110) -- 08-06-2009
These are the two I use most often. There are a bunch more: CAST and CONVERT on MSDN

Related

Using "Select INTO" for million of records

I'm using the SELECT INTO statement to create my SQL statement result.
SELECT fields
INTO [newtable]
FROM table1, table2, table3, table4
[where clause to filter records from four tables]
I would like to know how this statement would impact the performance if there million of records to insert into new table. Will there be OutOfMemory error in this case?
My environment is SQL Server 2008 R2.
Will there be OutOfMemory error in this case?
SQL Server is going to guess how many records you are inserting. It will then ask the OS for that much memory, hopefully avoiding an out of memory situation.
If SQL Server isn't given enough memory, or if it asks for too little, it will temporarily store the excess data in TempDB. As the name suggests, TempDB is a temporary database that lives on disk. https://msdn.microsoft.com/en-us/library/ms190768.aspx
To ensure good performance, put as much RAM as possible on your database server.
To ensure good performance, put your TempDB on its own hard drive. And make sure you buy an SSD hard drive, not a cheap spinner.
If you look at the execution plan in SQL Server Management Studio, you can see details such as how much memory the query used. http://www.developer.com/db/understanding-a-sql-server-query-execution-plan.html
If the memory SQL Server used is very different than the memory it asked for, your statistics are probably out of date. (Search for "SQL Server Statistics" for more info.)
It doesn't store the data in memory while it is transferring the data so there shouldn't be memory issues.
Logging and IO performance would take a hit as you are creating millions of rows. Also performance would downgrade if your joins are complex in your select statement (if the select alone takes ages then the insert will be even longer)
SELECT INTO is typically used to generate temp tables or to copy another table (data and/or structure).
SELECT INTO is usually much better performance wise since it is minimally logged.
Create a temp table with the necessary fields from the following tables
ex:
create table #temp_data
(
empid int,
emp_name varchar(50),
strt_date datetime,
emp_addrss varchar(50),
emp_sal numeric
)
Next insert the date from the four tables to the following columns with necessary conditions required.
insert #temp_data values
(
001,'raj','2013-04-03','hyderabad',25000
)
Update
update #temp_data
(
empid,emp_name,strt_date,emp_address,emp_sal
)
Now write the query as per ur requirements with where clause and all
select
empid,emp_name,strt_date,emp_addrss,emp_sal
from #temp_data
where empid = 001
Sol:
empid emp_name strt_date emp_addrss emp_sal
1 raj 2013-04-03 00:00:00.000 hyderabad 25000

SQL Query on table with 30mill records

I have been having problems building a table in my local SQL Server. Orginally it was causing the tempdb table to become full and throw an exception. This has a lot of joins and outer applies, and so to find specifically where the problem lay I did a select on the first table in the sql query to determine how long it took, that was fast so I then added the next table that was the first join in the query and reran, I continued to do this until I found the table that stalled.
I found the problem (or at least the first problem) was with the shipper_container table. This table is huge and pulling it alone gets a System.OutOfMemoryException just showing a select on the results of that table alone (it has only 5 columns). It cuts out at 16 million records but has 30 million rows. It is 1.2GB in size. This doesn't seem so big for me that SQL Management studio couldn't handle it.
Using a WHERE statement to collect values between 1 January - 10 January 2015 still resulted in a search that took over 5 minutes and was still executing when I cancelled. I have also added indexes on each of the select parameters and this did not increase performance either.
Here is the SQL Query. You can see I have commented out the other parameters that have yet to be added in other joins and outer applies.
DECLARE #startDate DATETIME
DECLARE #endDate DATETIME
DECLARE #Shipper_Key INT = NULL
DECLARE #Part_Key INT = NULL
SET #startDate = '2015-01-01'
SET #endDate = '2015-01-10'
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
INSERT Shipped_Container
(
Ship_Date,
Invoice_Quantity,
Shipper_No,
Serial_No,
Truck_Key,
Shipper_Key
)
SELECT
S.Ship_Date,
SC.Quantity,
S.Shipper_No,
SC.Serial_No,
S.Truck_Key,
S.Shipper_Key
FROM Shipper AS S
JOIN Shipper_Line AS SL
--ON SL.PCN = S.PCN
ON SL.Shipper_Key = S.Shipper_Key
JOIN Shipper_Container AS SC
--ON SC.PCN = SL.PCN
ON SC.Shipper_Line_Key = SL.Shipper_Line_Key
WHERE S.Ship_Date >= #startDate AND S.Ship_Date <= #endDate
AND S.Shipper_Key = ISNULL(#Shipper_Key, S.Shipper_Key)
AND SL.Part_Key = ISNULL(#Part_Key, SL.Part_Key)
The server instance is run on the local network - could this be an issue? I really have minimal experience at this and would really appreciate help and as detailed and clear as possible. Often in SQL forums people jump right into technical details I don't follow so well.
Don't do a Select ... From yourtable in SS Management Studio when it return
hundrend of thousand or millions of row. 1GB of data gets a lot bigger when the system has to draw and show it on screen in the Management Studio data sheet
The server instance is run on the local network
When you do a Select ... From yourtable in SSMS, the server must send all the data to your laptop/desktop. This is quite a lot of uneeded presure on the network.
It should not be an issue when you insert because everything stays on the server. However, staying on the server does not mean it will be fast if your data model is not good enough.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
You may get dirty data is you use that... It may be better to remove it unless you know why it is there and why you need it.
I have also added indexes on each of the select parameters and this did not increase performance either
If you mean indexes on :
S.Ship_Date,
SC.Quantity,
S.Shipper_No,
SC.Serial_No,
S.Truck_Key,
S.Shipper_Key
What are their definitions ?
If they are individual indexes on 1 column, you can drop indexes on SC.Quantity, S.Shipper_No, SC.Serial_No and S.Truck_Key. They are not used.
Ship_Date and Shipper_key may be usefull. It all depends on your model and existing primary keys. (which you need to describe, see below)
It will help to give a more accurate answer if you could tell us:
the relation between your 3 tables (which field link A to B and in which direction)
the primary key on your 3 tables
a complete list of all your indexes(and columns) on your 3 tables
If none of your indexes are usefull or if they are missing, it will most likely read the whole 3 tables and try to match them. Because it is pretty big, it does not have enough memory to process it and it uses tempdb to store intermediary data.
For now I will suppose that shipper_key + PCN is the primary key on each tables.
I think you can try that:
You can create an index on S.Ship_Date
Create Index Shipper_Line_Ship_Date(Ship_Date) -- subject to updates according to your Primary Key
The query optimizer may not use the indexes (if they exists) with such a where clause:
AND S.Shipper_Key = ISNULL(#Shipper_Key, S.Shipper_Key)
AND SL.Part_Key = ISNULL(#Part_Key, SL.Part_Key)
you can use:
AND (S.Shipper_Key = #Shipper_Key or #Shipper_Key is null)
AND (SL.Part_Key = #Part_Key or #Part_Keyis null)
It would help to have indexes on Shipper_Key and PCN
Finally
As I already said above, we need to know more about your data model (create table...), primary keys and indexes (create indexes). You can create a modele here http://sqlfiddle.com/ with all 3 create tables and their indexes. Then go to link and add the link here.
In SSMS, you can right click on a table and go to Script Table as / Create To / New Query Window and add it here or in http://sqlfiddle.com/. Only keep the CREATE TABLE ... part down to the first GO.
You can then do the same thing on all you indexes.
You should also add a copy of you query plan.
In SSMS, go to Query menu / Display Estimated Execution Plan and right click to save it as xml (xml is better). It is only an estimation and it won't execute the whole query. It should be pretty fast.

SQL way to get the MD5 or SHA1 of an entire row

Is there a "semi-portable" way to get the md5() or the sha1() of an entire row? (Or better, of an entire group of rows ordered by all their fields, i.e. order by 1,2,3,...,n)? Unfortunately not all DBs are PostgreSQL... I have to deal with at least microsoft SQL server, Sybase, and Oracle.
Ideally, I'd like to have an aggregator (server side) and use it to detect changes in groups of rows. For example, in tables that have some timestamp column, I'd like to store a unique signature for, say, each month. Then I could quickly detect months that have changed since my last visit (I am mirrorring certain tables to a server running Greenplum) and re-load those.
I've looked at a few options, e.g. checksum(*) in tsql (horror: it's very collision-prone, since it's based on a bunch of XORs and 32-bit values), and hashbytes('MD5', field), but the latter can't be applied to an entire row. And that would give me a solution just for one of the SQL flavors I have to deal with.
Any idea? Even for just one of the SQL idioms mentioned above, that would be great.
You could calculate the hashbytes value for the entire row on an update trigger, I used this as part of an ETL process where previously they were comparing all columns in the tables, the speed increase was huge.
Hashbytes works on varchar, nvarchar, or varbinary datatypes, and I wanted to compare integer keys and text fields, casting everything would have been a nightmare, so I used the FOR XML clause in SQL server as follows:
CREATE TRIGGER get_hash_value ON staging_table
FOR UPDATE, INSERT AS
UPDATE staging_table
SET sha1_hash = (SELECT hashbytes('sha1', (SELECT col1, col2, col3 FOR XML RAW)))
GO
alternatively, you could calculate the values in a similar way outside of a trigger, if you plan to do many updates on all the rows by using a subquery with the for xml clause also. If going this route, you can even change it to a SELECT *, but not in the trigger, as each time you run it you would be getting a different value because the sha1_hash column would be different each time.
You could modify the select statement to get more than 1 row
In MSSQL -- You can use HashBytes across the entire row by using xml..
SELECT MBT.id,
hashbytes('MD5',
(SELECT MBT.*
FROM (
VALUES(NULL))foo(bar)
FOR xml auto)) AS [Hash]
FROM <Table> AS MBT;
You need the from (values(null))foo(bar) clause to use xml auto, it serves no other purpose..

Can queries that read table variables generate parallel exection plans in SQL Server 2008?

First, from BOL:
Queries that modify table variables do not generate parallel query execution plans. Performance can be affected when very large table variables, or table variables in complex queries, are modified. In these situations, consider using temporary tables instead. For more information, see CREATE TABLE (Transact-SQL). Queries that read table variables without modifying them can still be parallelized.
That seems clear enough. Queries that read table variables, without modifying them, can still be parallelized.
But then over at SQL Server Storage Engine, an otherwise reputable source, Sunil Agarwal said this in an article on tempdb from March 30, 2008:
Queries involving table variables don't generate parallel plans.
Was Sunil paraphrasing BOL re: INSERT, or does the presence of table variables in the FROM clause prevent parallelism? If so, why?
I am thinking specifically of the control table use case, where you have a small control table being joined to a larger table, to map values, act as a filter, or both.
Thanks!
OK, I have a parallel select but not on the table variable
I've anonymised it and:
BigParallelTable is 900k rows and wide
For legacy reasons, BigParallelTable is partially denormalised (I'll fix it, later, promise)
BigParallelTable often generates parallel plans because it's not ideal and is "expensive"
SQL Server 2005 x64, SP3, build 4035, 16 cores
Query + plan:
DECLARE #FilterList TABLE (bar varchar(100) NOT NULL)
INSERT #FilterList (bar)
SELECT 'val1' UNION ALL 'val2' UNION ALL 'val3'
--snipped
SELECT
*
FROM
dbo.BigParallelTable BPT
JOIN
#FilterList FL ON BPT.Thing = FL.Bar
StmtText
|--Parallelism(Gather Streams)
|--Hash Match(Inner Join, HASH:([FL].[bar])=([BPT].[Thing]), RESIDUAL:(#FilterList.[bar] as [FL].[bar]=[MyDB].[dbo].[BigParallelTable].[Thing] as [BPT].[Thing]))
|--Parallelism(Distribute Streams, Broadcast Partitioning)
| |--Table Scan(OBJECT:(#FilterList AS [FL]))
|--Clustered Index Scan(OBJECT:([MyDB].[dbo].[BigParallelTable].[PK_BigParallelTable] AS [BPT]))
Now, thinking about it, a table variable is almost always a table scan, has no stats and is assumed one row "Estimated number of rows = 1", "Actual.. = 3".
Can we declare that table variables are not used in parallel, but the containing plan can use parallelism elsewhere? So BOL is correct and the SQL Storage article is wrong
Simple Example showing a parallel operator on a table variable itself.
DECLARE #T TABLE
(
X INT
)
INSERT INTO #T
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM master..spt_values v1,master..spt_values v2;
WITH E8(N)
AS (SELECT 1
FROM #T a,
#T b),
Nums(N)
AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM E8)
SELECT COUNT(N)
FROM Nums
OPTION (RECOMPILE)
[Answering my own question here, so I can present the relevant quotes appropriately....]
Boris B, from an thread at MSDN SQL Server forums:
Read-only queries that use table variables can still be parallelized. Queries that involve table variables that are modified run serially. We will correct the statement in Books Online. (emp. added)
and:
Note that there are two flavors of parallelism support:
A. The operator can/can not be in a parallel thread
B. The query can/can not be run in parallel because this operator exists in the tree.
B is a superset of A.
As best I can tell, table variables are not B and may be A.
Another relevant quote, re: non-inlined T-SQL TVFs:
Non-inlined T-SQL TVFs...is considered for parallelism if the TVF inputs are run-time constants, e.g. variables and parameters. If the input is a column (from a cross apply) then parallelism is disabled for the whole statement.
My understanding is that parallelism is blocked on table variables for UPDATE/DELETE/INSERT operations, but not for SELECTs. Proving that would be a lot more difficult than just hypothesizing, of course. :-)

Is there efficient SQL to query a portion of a large table

The typical way of selecting data is:
select * from my_table
But what if the table contains 10 million records and you only want records 300,010 to 300,020
Is there a way to create a SQL statement on Microsoft SQL that only gets 10 records at once?
E.g.
select * from my_table from records 300,010 to 300,020
This would be way more efficient than retrieving 10 million records across the network, storing them in the IIS server and then counting to the records you want.
SELECT * FROM my_table is just the tip of the iceberg. Assuming you're talking a table with an identity field for the primary key, you can just say:
SELECT * FROM my_table WHERE ID >= 300010 AND ID <= 300020
You should also know that selecting * is considered poor practice in many circles. They want you specify the exact column list.
Try looking at info about pagination. Here's a short summary of it for SQL Server.
Absolutely. On MySQL and PostgreSQL (the two databases I've used), the syntax would be
SELECT [columns] FROM table LIMIT 10 OFFSET 300010;
On MS SQL, it's something like SELECT TOP 10 ...; I don't know the syntax for offsetting the record list.
Note that you never want to use SELECT *; it's a maintenance nightmare if anything ever changes. This query, though, is going to be incredibly slow since your database will have to scan through and throw away the first 300,010 records to get to the 10 you want. It'll also be unpredictable, since you haven't told the database which order you want the records in.
This is the core of SQL: tell it which 10 records you want, identified by a key in a specific range, and the database will do its best to grab and return those records with minimal work. Look up any tutorial on SQL for more information on how it works.
When working with large tables, it is often a good idea to make use of Partitioning techniques available in SQL Server.
The rules of your partitition function typically dictate that only a range of data can reside within a given partition. You could split your partitions by date range or ID for example.
In order to select from a particular partition you would use a query similar to the following.
SELECT <Column Name1>…/*
FROM <Table Name>
WHERE $PARTITION.<Partition Function Name>(<Column Name>) = <Partition Number>
Take a look at the following white paper for more detailed infromation on partitioning in SQL Server 2005.
http://msdn.microsoft.com/en-us/library/ms345146.aspx
I hope this helps however please feel free to pose further questions.
Cheers, John
I use wrapper queries to select the core query and then just isolate the ROW numbers that i wish to take from the query - this allows the SQL server to do all the heavy lifting inside the CORE query and just pass out the small amount of the table that i have requested. All you need to do is pass the [start_row_variable] and the [end_row_variable] into the SQL query.
NOTE: The order clause is specified OUTSIDE the core query [sql_order_clause]
w1 and w2 are TEMPORARY table created by the SQL server as the wrapper tables.
SELECT
w1.*
FROM(
SELECT w2.*,
ROW_NUMBER() OVER ([sql_order_clause]) AS ROW
FROM (
<!--- CORE QUERY START --->
SELECT [columns]
FROM [table_name]
WHERE [sql_string]
<!--- CORE QUERY END --->
) AS w2
) AS w1
WHERE ROW BETWEEN [start_row_variable] AND [end_row_variable]
This method has hugely optimized my database systems. It works very well.
IMPORTANT: Be sure to always explicitly specify only the exact columns you wish to retrieve in the core query as fetching unnecessary data in these CORE queries can cost you serious overhead
Use TOP to select only a limited amont of rows like:
SELECT TOP 10 * FROM my_table WHERE ID >= 300010
Add an ORDER BY if you want the results in a particular order.
To be efficient there has to be an index on the ID column.