Create a function with whole columns as input and output - sql

I have several programs written in R that now I need to translate in T-SQL to deliver them to the client. I am new to T-SQL and I'm facing some difficulties in translating all my R functions.
An example is the numerical derivative function, which for two input columns (values and time) would return another column (of same length) with the computed derivative.
My current understanding is:
I can't use SP, because I'll need to use this functions inline with
select statement, like:
SELECT Customer_ID, Date, Amount, derivative(Amount, Date) FROM Customer_Detail
I can't use UDF, because they can take, as input parameter, only scalar. I'll need vectorised function due to speed and also because for some functions I have, like the one above, running row by row wouldn't be meaningful (for each value it needs the next and the previous)
UDA take whole column but, as the name says..., they will aggregate the column like sum or avg would.
If the above is correct, which other techniques would allow me to create the type of function I need? An example of SQL built-in function similar to what I'm after is square() which (apparently) takes a column and returns itself^2. My goal is creating a library of functions which behave like square, power, etc. But internally it'll be different cause square takes and returns each scalar is read through the rows. I would like to know if is possible to have User Defied with an accumulate method (like the UDA) able to operates on all the data at the end of the import and then return a column of the same length?
NB: At the moment I'm on SQL-Server 2005 but we'll switch soon to 2012 (or possibly 2014 in few months) so answers based on any 2005+ version of SQL-Server are fine.
EDIT: added the R tag for R developers who have, hopefully, already faced such difficulties.
EDIT2: Added CLR tag: I went through CLR user defined aggregate as defined in the Pro t-sql 2005 programmers guide. I already said above that this type of function wouldn't fit my needs but it was worth looking into it. The 4 methods needed by a UDA are: Init, Accumulate, Merge and Terminate. My request would need the whole data being analysed all together by the same instance of the UDA. So options including merge methods to group together partial results from multicore processing won't be working.

I think you may consider changing your mind a bit. SQL language is very good when working with sets of data, especially modern RDBMS implementations (like SQL Server 2012), but you have to think in sets, not in rows or columns. While I stilldon't know your exact tasks, let's see - SQL Server 2012 have very nice set of window functions + ranking functions + analytic functions + common table expressions, so you can write almost any query inline. You can use chains of common table expression to turn your data any way you want, to calculate running totals, to calculate averages or other aggregates over window and so on.
Actually, I've always liked SQL and when I've learned functional language (ML and Scala) a bit, my thought was that my approach to SQL is very similar to functional language paradigm - just slicing and dicing data without saving anything into variables, untils you have resultset your need.
Just quick example, here's a question from SO - How to get average of the 'middle' values in a group?. The goal was to get the average for each group of the middle 3 values:
TEST_ID TEST_VALUE GROUP_ID
1 5 1 -+
2 10 1 +- these values for group_id = 1
3 15 1 -+
4 25 2 -+
5 35 2 +- these values for group_id = 2
6 5 2 -+
7 15 2
8 25 3
9 45 3 -+
10 55 3 +- these values for group_id = 3
11 15 3 -+
12 5 3
13 25 3
14 45 4 +- this value for group_id = 4
For me, it's not an easy task to do in R, but in SQL it could be a really simple query like this:
with cte as (
select
*,
row_number() over(partition by group_id order by test_value) as rn,
count(*) over(partition by group_id) as cnt
from test
)
select
group_id, avg(test_value)
from cte
where
cnt <= 3 or
(rn >= cnt / 2 - 1 and rn <= cnt / 2 + 1)
group by group_id
You can also easily expand this query to get 5 values around the middle.
TAke closer look to analytical functions, try to rethink your calculations in terms of window functions, may be it's not so hard to rewrite your R procedures in plain SQL.
Hope it helps.

I would solve this by passing a reference to the record(s) you want to process, and use so called "inline table-valued function" to return the record(s) after processing the initial records.
You find the table-function reference here:
http://technet.microsoft.com/en-en/library/ms186755.aspx
A Sample:
CREATE FUNCTION Sales.CustomerExtendedInfo (#CustomerID int)
RETURNS TABLE
AS
RETURN
(
SELECT FirstName + LastName AS CompleteName,
DATEDIFF(Day,CreateDate,GetDate()) AS DaysSinceCreation
FROM Customer_Detail
WHERE CustomerID = #CustomerID
);
GO
StoreID would be the Primary-Key of the Records you want to process.
Table-Function can afterwards be joined to other Query results if you want to process more than one record at once.
Here is a Sample:
SELECT * FROM Customer_Detail
CROSS APPLY Sales.CustomerExtendedInfo (CustomerID)
Using a normal Stored Procedure would do the same more or less, but it's a bit tricky to work with the results programmatically.
But keep one thing in mind: SQL-Server is not really good for "functional-programming". It's brilliant working with data and sets of data, but the more you use it as a "application server" the more you will realize it's not made for that.

I don't think this is possible in pure T-SQL without using cursors. But with cursors, stuff will usually be very slow. Cursors are processing the table row-by/row, and some people call this "slow-by-slow".
But you can create your own aggregate function (see Technet for more details). You have to implement the function using the .NET CLR (e.g. C# or R.NET).
For a nice example see here.
I think interfacing R with SQL is a very nice solution. Oracle is offering this combo as a commercial product, so why not going the same way with SQL Server.
When integrating R in the code using the own aggregate functions, you will only pay a small performance penalty. Own aggregate functions are quite fast according to the Microsoft documentation: "Managed code generally performs slightly slower than built-in SQL Server aggregate functions". And the R.NET solution seems also to be quite fast by loading the native R DLL directly in the running process. So it should be much faster than using R over ODBC.

ORIGINAL RESPONSE:
if you know already what are the functions you will need one of the approach I can think of is, creating one In-Line function for each method/operation you want to apply per table.
what I mean by that? for example you mentioned FROM Customer_Detail table when you select you might want need one method "derivative(Amount, Date)". let's say second method you might need (I am just making up for explanation) is "derivative1(Amount1, Date1)".
we create two In-Line Functions, each will do its own calculation inside function on intended columns and also returns remaining columns as it is. that way you get all columns as you get from table and also perform custom calculation as a set-based operation instead scalar operation.
later you can combine the Independent calculation of columns in same function if make sense.
you can still use this all functions and do JOIN to get all custom calculation in single set if needed as all functions will have common/unprocessed columns coming as it is.
see the example below.
IF object_id('Product','u') IS NOT NULL
DROP TABLE Product
GO
CREATE TABLE Product
(
pname sysname NOT NULL
,pid INT NOT NULL
,totalqty INT NOT NULL DEFAULT 1
,uprice NUMERIC(28,10) NOT NULL DEFAULT 0
)
GO
INSERT INTO Product( pname, pid, totalqty, uprice )
SELECT 'pen',1,100,1.2
UNION ALL SELECT 'book',2,300,10.00
UNION ALL SELECT 'lock',3,500,15.00
GO
IF object_id('ufn_Product_totalValue','IF') IS NOT NULL
DROP FUNCTION ufn_Product_totalValue
GO
CREATE FUNCTION ufn_Product_totalValue
(
#newqty int
,#newuprice numeric(28,10)
)
RETURNS TABLE AS
RETURN
(
SELECT pname,pid,totalqty,uprice,totalqty*uprice AS totalValue
FROM
(
SELECT
pname
,pid
,totalqty+#newqty AS totalqty
,uprice+#newuprice AS uprice
FROM Product
)qry
)
GO
IF object_id('ufn_Product_totalValuePct','IF') IS NOT NULL
DROP FUNCTION ufn_Product_totalValuePct
GO
CREATE FUNCTION ufn_Product_totalValuePct
(
#newqty int
,#newuprice numeric(28,10)
)
RETURNS TABLE AS
RETURN
(
SELECT pname,pid,totalqty,uprice,totalqty*uprice/100 AS totalValuePct
FROM
(
SELECT
pname
,pid
,totalqty+#newqty AS totalqty
,uprice+#newuprice AS uprice
FROM Product
)qry
)
GO
SELECT * FROM ufn_Product_totalValue(10,5)
SELECT * FROM ufn_Product_totalValuepct(10,5)
select tv.pname,tv.pid,tv.totalValue,pct.totalValuePct
from ufn_Product_totalValue(10,5) tv
join ufn_Product_totalValuePct(10,5) pct
on tv.pid=pct.pid
also check the output as shown below.
EDIT2:
three point smoothing Algorithms
IF OBJECT_ID('Test3PointSmoothingAlgo','u') IS NOT NULL
DROP TABLE Test3PointSmoothingAlgo
GO
CREATE TABLE Test3PointSmoothingAlgo
(
qty INT NOT NULL
,id INT IDENTITY NOT NULL
)
GO
INSERT Test3PointSmoothingAlgo( qty ) SELECT 10 UNION SELECT 20 UNION SELECT 30
GO
IF object_id('ufn_Test3PointSmoothingAlgo_qty','IF') IS NOT NULL
DROP FUNCTION ufn_Test3PointSmoothingAlgo_qty
GO
CREATE FUNCTION ufn_Test3PointSmoothingAlgo_qty
(
#ID INT --this is a dummy parameter
)
RETURNS TABLE AS
RETURN
(
WITH CTE_3PSA(SmoothingPoint,Coefficients)
AS --finding the ID of adjacent points
(
SELECT id,id
FROM Test3PointSmoothingAlgo
UNION
SELECT id,id-1
FROM Test3PointSmoothingAlgo
UNION
SELECT id,id+1
FROM Test3PointSmoothingAlgo
)
--Apply 3 point Smoothing algorithms formula
SELECT a.SmoothingPoint,SUM(ISNULL(b.qty,0))/3 AS Qty_Smoothed--this is a using 3 point smoothing algoritham formula
FROM CTE_3PSA a
LEFT JOIN Test3PointSmoothingAlgo b
ON a.Coefficients=b.id
GROUP BY a.SmoothingPoint
)
GO
SELECT SmoothingPoint,Qty_Smoothed FROM dbo.ufn_Test3PointSmoothingAlgo_qty(NULL)

I think you may need to break you functionalities into two parts - into UDA which can work on scopes thank to OVER (...) clause and formulas which combine the result scalars.
What you are asking for - to define objects in such a way as to make it a aggregate/scalar combo - is probably out of scope of regular SQL Server's capabilities, unless you fall back into CLR code the effectively would be equivalent to cursor in terms of performance or worse.
Your best shot is to probably defined SP (I know you don't what that) that will produce the whole result. Like create [derivative] stored procedure that will take in parameters with table and column names as parameters. You can even expand on the idea but in the end that's not what you want exactly.

Since you mention you will be upgrading to SQL Server 2012 - SQL Server 2008 introduced Table Valued Parameters
This feature will do what you want. You will have to define a User Defined Type (UDT) in your DB which is like a table definition with columns & their respective types.
You can then use that UDT as a parameter type for any other stored procedure or function in your DB.
You can combine these UDTs with CLR integration to achieve what you require.
As mentioned SQL is not good when you are comparing rows to other rows, it's much better at set based operations where every row is treated as an independent entity.
But, before looking at cursors & CLR, you should make sure it can't be done in pure TSQL which will almost always be faster & scale better as your table grows.
One method for comparing rows based on order is wrap your data in a CTE, adding a ranking function like ROW_NUMBER to set the row order, followed by a self-join of the CTE onto itself.
The join will be performed on the ordered field e.g. ROW_NUMBER=(ROW_NUMBER-1)
Look at this article for an example

Related

How can I create a temporary numbers table with SQL?

So I came upon a question where someone asked for a list of unused account numbers. The query I wrote for it works, but it is kind of hacky and relies on the existence of a table with more records than existing accounts:
WITH tmp
AS (SELECT Row_number()
OVER(
ORDER BY cusno) a
FROM custtable
fetch first 999999 rows only)
SELECT tmp.a
FROM tmp
WHERE a NOT IN (SELECT cusno
FROM custtable)
This works because customer numbers are reused and there are significantly more records than unique customer numbers. But, like I said, it feels hacky and I'd like to just generate a temporary table with 1 column and x records that are numbered 1 through x. I looked at some recursive solutions, but all of it looked way more involved than the solution I wound up using. Is there an easier way that doesn't rely on existing tables?
I think the simple answer is no. To be able to make a determination of absence, the platform needs to know the expected data set. You can either generate that as a temporary table or data set at runtime - using the method you've used (or a variation thereof) - or you can create a reference table once, and compare against it each time. I'd favour the latter - a table with a single column of integers won't put much of a dent in your disk space and it doesn't make sense to compute an identical result set over and over again.
Here's a really good article from Aaron Bertrand that deals with this very issue:
https://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
(Edit: The queries in that article are TSQL specific, but they should be easily adaptable to DB2 - and the underlying analysis is relevant regardless of platform)
If you search all unused account number you can do it :
with MaxNumber as
(
select max(cusno) MaxID from custtable
),
RecurceNumber (id) as
(
values 1
union all
select id + 1 from RecurceNumber cross join MaxNumber
where id<=MaxID
)
select f1.* from RecurceNumber f1 exception join custtable f2 on f1.id=f2.cusno

Updating a subset of data through a CTE

Question
I've just come across the concept of using update statements on CTEs.
This seems a great approach, but I've not seen it used before, and the context in which I was introduced to it (i.e. uncovered it in some badly written code) suggests the author didn't know what they were doing.
Is anyone aware of any reason not to perform updates on CTEs / any considerations which should be made when doing so (assuming the CTE gives some benefit; such as to allow you to update an arbitrary subset of data).
Full Info
I recently found some horrendous code in our production environment where someone had clearly been experimenting on ways to update a single row or data. I've tidied up the layout to make it readable, but have left the original logic as is.
CREATE PROCEDURE [dbo].[getandupdateWorkOrder]
-- Add the parameters for the stored procedure here
-- #p1 xml Output
AS
BEGIN
WITH T AS
(
SELECT XMLDoc
, Retrieved
from [Transcode].[dbo].[WorkOrder]
where WorkOrderId IN
(
SELECT TOP 1 WorkOrderId
FROM
(
SELECT DISTINCT(WorkOrderId)
, Retrieved
FROM [Transcode].[dbo].[WorkOrder]
WHERE Retrieved = 0
) as c
)
AND Retrieved = 0
)
UPDATE T
SET Retrieved = 1
Output inserted.XMLDoc
END
I can easily update this to the below without affecting the logic:
CREATE PROCEDURE [dbo].[GetAndUpdateWorkOrder]
AS
BEGIN
WITH T AS
(
SELECT top 1 XMLDoc
, Retrieved
from [WorkOrder]
where Retrieved = 0
)
UPDATE T
SET Retrieved = 1
Output inserted.XMLDoc
END
However the code also introduced me to a new concept; that you could update CTEs / see those updates in the underlying tables (I'd previously assumed that CTEs were read only in-memory copies of the data selected from the original table, and thus not possible to amend).
Had I not seen the original code, but needed something which behaved like this I'd have written it as follows:
CREATE PROCEDURE [dbo].[GetAndUpdateWorkOrder]
AS
BEGIN
UPDATE [WorkOrder]
SET Retrieved = 1
Output inserted.XMLDoc
where Id in
(
select top 1 Id
from [WorkOrder]
where Retrieved = 0
--order by Id --I'd have included this too; but not including here to ensure my rewrite is *exactly* the same as the original in terms of functionality, including the unpredictable part (the bonus of not including this is a performance benefit; though that's negligible given the data in this table)
)
END
The code which performs the update via the CTE looks much cleaner (i.e. you don't even need to rely on a unique id for this to work).
However because the rest of the original code is badly written I'm apprehensive about this new technique, so want to see what the experts say about this approach before adding it to my arsenal.
Updating CTEs is fine. There are limitations on the subqueries that you can use (such as no aggregations).
However, you have a misconception about CTEs in SQL Server. They do not create in-memory tables. Instead, they operate more like views, where the code is included in the query. The overall query is then optimized. Note: this behavior differs from other databases and, there is no way to override this, even with a hint.
This is an important distinction. If you have a complex CTE and use it more than once, then it will typically execute for each reference in the overall query.
Updating through CTEs are fine. It's especially handy when you have to deal with window functions. For example, you can use this query to give the top 10 performing employees in each department a 10% raise:
WITH TopPerformers AS
(
SELECT DepartmentID, EmployeeID, Salary,
RANK() OVER (PARTITION BY DepartmentID ORDER BY PerformanceScore DESC) AS EmployeeRank
FROM Employees
)
UPDATE TopPerformers
SET Salary = Salary * 1.1
WHERE EmployeeRank <= 10
(I'm ignoring the fact that there can be more than 10 employees per department in case many have the same score, but that's beyond the point here.)
Nice clean and easy to understand. I see CTE as a temporary view so I tend to follow what Microsoft says about updating views. See the Updatable Views section on this page.

PostgreSQL query decomposition

I fail to decompose simple SQL queries. I use PostgreSQL but my question is also related to other RDBMS.
Consider the following example. We have table orders and we want to find first order after which total amount exceeded some limit:
drop table if exists orders cascade;
/**
Table with clients' orders
*/
create table orders(
date timestamp,
amount integer
/**
Other columns omitted
*/
);
/**
Populate with test data
*/
insert into orders(date,amount)
values
('2011-01-01',50),
('2011-01-02',49),
('2011-01-03',2),
('2011-01-04',1000);
/**
Selects first order that caused exceeding of limit
*/
create view first_limit_exceed
as
select min(date) from
(
select o1.date
from orders o1,
orders o2
where o2.date<=o1.date
group by o1.date
having sum(o2.amount) > 100
) limit_exceed;
/**
returns "2011-01-03 00:00:00"
*/
select * from first_limit_exceed;
Now let's make the problem a little harder. Consider we want to find total amount only for rows that satisfy some predicate. We have a lot of such predicates and creating separate version of view first_limit_exceed would be terrible code duplication. So we need some way to create parameterized view and pass either filtered set of rows or predicate itself to it.
In Postgres we can use query language functions as parameterized views. But Postgres does not allow function to get as argument neither set of row nor another function.
I still can use string interpolation on client's side or in plpgsql function, but it is error-prone and hard to test and debug.
Any advice?
In PostgreSQL 8.4 and later:
SELECT *
FROM (
SELECT *,
SUM(amount) OVER (ORDER BY date) AS psum
FROM orders
) q
WHERE psum > 100
ORDER BY
date
LIMIT 1
Add any predicates you want into the inner query:
SELECT *
FROM (
SELECT *,
SUM(amount) OVER (ORDER BY date) AS psum
FROM orders
WHERE date >= '2011-01-03'
) q
WHERE psum > 100
ORDER BY
date
LIMIT 1
It sounds a bit like you're trying to put too much code into the database. If you are interested in the rows of a certain relation that satisfy a particular predicate, just execute a select statement with an appropriate where clause in the client code. Having views that take predicates as parameters is reinventing the wheel that sql already solves nicely.
On the other hand, I can see an argument for storing queries themselves in the database, so that they can be composed into larger reports. This two is still better handled by application code. I might approach a problem like that by using a library that's good at dynamic sql generatation, (for example sqlalchemy), and then storing the query representations (sqlalchemy expression objects are 'pickleable') as blobs in the database.
To put it another way, databases are representers of facts, You store knowledge in them. applications have the duty of acting on user requests, When you find yourself defining transformations on the data, that's really more a matter of anticipating and implementing the requests of actual users, rather than just faithfully preserving knowledge.
Views are best used when the schema inevitably changes, so you can leave older applications that don't need to know about the new schema in a working state.

Numbering rows in a view

I am connecting to an SQL database using a PLC, and need to return a list of values. Unfortunately, the PLC has limited memory, and can only retrieve approximately 5,000 values at any one time, however the database may contain up to 10,000 values.
As such I need a way of retrieving these values in 2 operations. Unfortunately the PLC is limited in the query it can perform, and is limited to only SELECT and WHERE commands, so I cannot use LIMIT or TOP or anything like that.
Is there a way in which I can create a view, and auto number every field in that view? I could then query all records < 5,000, followed by a second query of < 10,000 etc?
Unfortunately it seems that views do not support the identity column, so this would need to be done manually.
Anyone any suggestions? My only realistic option at the moment seems to be to create 2 views, one with the first 5,000 and 1 with the next 5,000...
I am using SQL Server 2000 if that makes a difference...
There are 2 solutions. The easiest is to modify your SQL table and add an IDENTITY column. If that is not a possibility, the you'll have to do something like the below query. For 10000 rows, it shouldn't be too slow. But as the table grows, it will become worse and worse-performing.
SELECT Col1, Col2, (SELECT COUNT(i.Col1)
FROM yourtable i
WHERE i.Col1 <= o.Col1) AS RowID
FROM yourtable o
While the code provided by Derek does what I asked - i.e numbers each row in the view, the performance for this is really poor - approximately 20 seconds to number 100 rows. As such it is not a workable solution. An alternative is to number the first 5,000 records with a 1, and the next 5,000 with a 2. This can be done with 3 simple queries, and is far quicker to execute.
The code to do so is as follows:
SELECT TOP(5000) BCode, SAPCode, 1 as GroupNo FROM dbo.DB
UNION
SELECT TOP (10000) BCode, SAPCode, 2 as GroupNo FROM dbo.DB p
WHERE ID NOT IN (SELECT TOP(5000) ID FROM dbo.DB)
Although, as pointed out by Andriy M, you should also specify an explicit sort, to ensure the you dont miss any records.
One possibility might be to use a function with a temporary table such as
CREATE FUNCTION dbo.OrderedBCodeData()
RETURNS #Data TABLE (RowNumber int IDENTITY(1,1),BCode int,SAPCode int)
AS
BEGIN
INSERT INTO #Data (BCode,SAPCode)
SELECT BCode,SAPCode FROM dbo.DB ORDER BY BCode
RETURN
END
And select from this function such as
SELECT FROM dbo.OrderedBCodeData() WHERE RowNumber BETWEEN 5000 AND 10000
I haven't used this in production ever, in fact was just a quick idea this morning but worth exploring as a neater alternative?

Is there efficient SQL to query a portion of a large table

The typical way of selecting data is:
select * from my_table
But what if the table contains 10 million records and you only want records 300,010 to 300,020
Is there a way to create a SQL statement on Microsoft SQL that only gets 10 records at once?
E.g.
select * from my_table from records 300,010 to 300,020
This would be way more efficient than retrieving 10 million records across the network, storing them in the IIS server and then counting to the records you want.
SELECT * FROM my_table is just the tip of the iceberg. Assuming you're talking a table with an identity field for the primary key, you can just say:
SELECT * FROM my_table WHERE ID >= 300010 AND ID <= 300020
You should also know that selecting * is considered poor practice in many circles. They want you specify the exact column list.
Try looking at info about pagination. Here's a short summary of it for SQL Server.
Absolutely. On MySQL and PostgreSQL (the two databases I've used), the syntax would be
SELECT [columns] FROM table LIMIT 10 OFFSET 300010;
On MS SQL, it's something like SELECT TOP 10 ...; I don't know the syntax for offsetting the record list.
Note that you never want to use SELECT *; it's a maintenance nightmare if anything ever changes. This query, though, is going to be incredibly slow since your database will have to scan through and throw away the first 300,010 records to get to the 10 you want. It'll also be unpredictable, since you haven't told the database which order you want the records in.
This is the core of SQL: tell it which 10 records you want, identified by a key in a specific range, and the database will do its best to grab and return those records with minimal work. Look up any tutorial on SQL for more information on how it works.
When working with large tables, it is often a good idea to make use of Partitioning techniques available in SQL Server.
The rules of your partitition function typically dictate that only a range of data can reside within a given partition. You could split your partitions by date range or ID for example.
In order to select from a particular partition you would use a query similar to the following.
SELECT <Column Name1>…/*
FROM <Table Name>
WHERE $PARTITION.<Partition Function Name>(<Column Name>) = <Partition Number>
Take a look at the following white paper for more detailed infromation on partitioning in SQL Server 2005.
http://msdn.microsoft.com/en-us/library/ms345146.aspx
I hope this helps however please feel free to pose further questions.
Cheers, John
I use wrapper queries to select the core query and then just isolate the ROW numbers that i wish to take from the query - this allows the SQL server to do all the heavy lifting inside the CORE query and just pass out the small amount of the table that i have requested. All you need to do is pass the [start_row_variable] and the [end_row_variable] into the SQL query.
NOTE: The order clause is specified OUTSIDE the core query [sql_order_clause]
w1 and w2 are TEMPORARY table created by the SQL server as the wrapper tables.
SELECT
w1.*
FROM(
SELECT w2.*,
ROW_NUMBER() OVER ([sql_order_clause]) AS ROW
FROM (
<!--- CORE QUERY START --->
SELECT [columns]
FROM [table_name]
WHERE [sql_string]
<!--- CORE QUERY END --->
) AS w2
) AS w1
WHERE ROW BETWEEN [start_row_variable] AND [end_row_variable]
This method has hugely optimized my database systems. It works very well.
IMPORTANT: Be sure to always explicitly specify only the exact columns you wish to retrieve in the core query as fetching unnecessary data in these CORE queries can cost you serious overhead
Use TOP to select only a limited amont of rows like:
SELECT TOP 10 * FROM my_table WHERE ID >= 300010
Add an ORDER BY if you want the results in a particular order.
To be efficient there has to be an index on the ID column.