SQL Data Sampling

SQL Data Sampling - sql

We have had a request to provide some data to an external company.
They require only a sample of data, simple right? wrong.
Here is their sampling criteria:
Total Number of records divided by 720 (required sample size) - this gives sampling interval (if result is a fraction, round down to next whole number).
Halve the sampling interval to get the starting point.
Return each record by adding on the sampling interval.
EXAMPLE:
10,000 Records - Sampling interval = 13 (10,000/720)
Starting Point = 6 (13/2 Rounded)
Return records 6, 19 (6+13), 32 (19+13), 45 (32+13) etc.....
Please can someone tell me how (if) something like this is possible in SQL.

If you have use of ROW_NUMBER(), then you can do this relatively easily.
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER (ORDER BY a, b, c, d) AS record_id,
*
FROM
yourTable
)
AS data
WHERE
(record_id + 360) % 720 = 0
ROW_NUMBER() gives all your data a sequential identifier (this is important as the id field must both be unique and NOT have ANY gaps). It also defines the order you want the data in (ORDER BY a, b, c, d).
With that id, if you use Modulo (Often the % operator), you can test if the record is the 720th record, 1440th record, etc (because 720 % 720 = 0).
Then, if you offset your id value by 360, you can change the starting point of your result set.
EDIT
After re-reading the question, I see you don't want every 720th record, but uniformly selected 720 records.
As such, replace 720 with (SELECT COUNT(*) / 720 FROM yourTable)
And replace 360 with (SELECT (COUNT(*) / 720) / 2 FROM yourTable)
EDIT
Ignoring the rounding conditions will allow a result of exactly 720 records. This requires using non-integer values, and the result of the modulo being less than 1.
WHERE
(record_id + (SELECT COUNT(*) FROM yourTable) / 1440.0)
%
((SELECT COUNT(*) FROM yourTable) / 720.0)
<
1.0

declare #sample_size int, #starting_point int
select #sample_size = 200
select top (#sample_size) col1, col2, col3, col4
from (
select *, row_number() over (order by col1, col2) as row
from your_table
) t
where (row % ((select count(*) from your_table) / #sample_size)) - (select count(*) from your_table) / #sample_size / 2) = 0
It's going to work in SQL Server 2005+.
TOP (#variable) is used to limit rows (where condition because of integers rounding might not be enough, may return more rows then needed) and ROW_NUMBER() to number and order rows.
Working example: https://data.stackexchange.com/stackoverflow/query/62315/sql-data-sampling below code:
declare #tab table (id int identity(1,1), col1 varchar(3), col2 varchar(3))
declare #i int
set #i = 0
while #i <= 1000
begin
insert into #tab
select 'aaa', 'bbb'
set #i = #i+1
end
declare #sample_size int
select #sample_size = 123
select ((select count(*) from #tab) / #sample_size) as sample_interval
select top (#sample_size) *
from (
select *, row_number() over (order by col1, col2, id desc) as row
from #tab
) t
where (row % ((select count(*) from #tab) / #sample_size)) - ((select count(*) from #tab) / #sample_size / 2) = 0

SQL server has in-built function for it.
SELECT FirstName, LastName
FROM Person.Person
TABLESAMPLE (10 PERCENT) ;

You can use rank to get a row-number. The following code will create 10000 records in a table, then select the 6th, 19th, 32nd, etc, for a total of 769 rows.
CREATE TABLE Tbl (
Data varchar (255)
)
GO
DECLARE #i int
SET #i = 0
WHILE (#i < 10000)
BEGIN
INSERT INTO Tbl (Data) VALUES (CONVERT(varchar(255), NEWID()))
SET #i = #i + 1
END
GO
DECLARE #interval int
DECLARE #start int
DECLARE #total int
SELECT #total = COUNT(*),
#start = FLOOR(COUNT(*) / 720) / 2,
#interval = FLOOR(COUNT(*) / 720)
FROM Tbl
PRINT 'Start record: ' + CAST(#start as varchar(10))
PRINT 'Interval: ' + CAST(#interval as varchar(10))
SELECT rank, Data
FROM (
SELECT rank()
OVER (ORDER BY t.Data) as rank, t.Data AS Data
FROM Tbl t) q
WHERE ((rank + 1) + #start) % #interval = 0

Related

Selecting data from table where sum of values in a column equal to the value in another column

Sample data:
create table #temp (id int, qty int, checkvalue int)
insert into #temp values (1,1,3)
insert into #temp values (2,2,3)
insert into #temp values (3,1,3)
insert into #temp values (4,1,3)
According to data above, I would like to show exact number of lines from top to bottom where sum(qty) = checkvalue. Note that checkvalue is same for all the records all the time. Regarding the sample data above, the desired output is:
Id Qty checkValue
1 1 3
2 2 3
Because 1+2=3 and no more data is needed to show. If checkvalue was 4, we would show the third record: Id:3 Qty:1 checkValue:4 as well.
This is the code I am handling this problem. The code is working very well.
declare #checkValue int = (select top 1 checkvalue from #temp);
declare #counter int = 0, #sumValue int = 0;
while #sumValue < #checkValue
begin
set #counter = #counter + 1;
set #sumValue = #sumValue + (
select t.qty from
(
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY id ASC) AS rownumber,
id,qty,checkvalue
FROM #temp
) AS foo
WHERE rownumber = #counter
) t
)
end
declare #sql nvarchar(255) = 'select top '+cast(#counter as varchar(5))+' * from #temp'
EXECUTE sp_executesql #sql, N'#counter int', #counter = #counter;
However, I am not sure if this is the best way to deal with it and wonder if there is a better approach. There are many professionals here and I'd like to hear from them about what they think about my approach and how we can improve it. Any advice would be appreciated!

Try this:
select id, qty, checkvalue from (
select t1.*,
sum(t1.qty) over (partition by t2.id) [sum]
from #temp [t1] join #temp [t2] on t1.id <= t2.id
) a where checkvalue = [sum]
Smart self-join is all you need :)

For SQL Server 2012, and onwards, you can easily achieve this using ROWS BETWEEN in your OVER clause and the use of a CTE:
WITH Running AS(
SELECT *,
SUM(qty) OVER (ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningQty
FROM #temp t)
SELECT id, qty, checkvalue
FROM Running
WHERE RunningQty <= checkvalue;

One basic improvement is to try & reduce the no. of iterations. You're incrementing by 1, but if you repurpose the logic behind binary searching, you'd get something close to this:
DECLARE #RoughAverage int = 1 -- Some arbitrary value. The closer it is to the real average, the faster things should be.
DECLARE #CheckValue int = (SELECT TOP 1 checkvalue FROM #temp)
DECLARE #Sum int = 0
WHILE 1 = 1 -- Refer to BREAK below.
BEGIN
SELECT TOP (#RoughAverage) #Sum = SUM(qty) OVER(ORDER BY id)
FROM #temp
ORDER BY id
IF #Sum = #CheckValue
BREAK -- Indicating you reached your objective.
ELSE
SET #RoughAverage = #CheckValue - #Sum -- Most likely incomplete like this.
END

For SQL 2008 you can use recursive cte. Top 1 with ties limits result with first combination. Remove it to see all combinations
with cte as (
select
*, rn = row_number() over (order by id)
from
#temp
)
, rcte as (
select
i = id, id, qty, sumV = qty, checkvalue, rn
from
cte
union all
select
a.id, b.id, b.qty, a.sumV + b.qty, a.checkvalue, b.rn
from
rcte a
join cte b on a.rn + 1 = b.rn
where
a.sumV < b.checkvalue
)
select
top 1 with ties id, qty, checkvalue
from (
select
*, needed = max(case when sumV = checkvalue then 1 else 0 end) over (partition by i)
from
rcte
) t
where
needed = 1
order by dense_rank() over (order by i)

SQL Server - loop through table and update based on count

I have a SQL Server database. I need to loop through a table to get the count of each value in the column 'RevID'. Each value should only be in the table a certain number of times - for example 125 times. If the count of the value is greater than 125 or less than 125, I need to update the column to ensure all values in the RevID (are over 25 different values) is within the same range of 125 (ok to be a few numbers off)
For example, the count of RevID = "A2" is = 45 and the count of RevID = 'B2' is = 165 then I need to update RevID so the 45 count increases and the 165 decreases until they are within the 125 range.
This is what I have so far:
DECLARE #i INT = 1,
#RevCnt INT = SELECT RevId, COUNT(RevId) FROM MyTable group by RevId
WHILE(#RevCnt >= 50)
BEGIN
UPDATE MyTable
SET RevID= (SELECT COUNT(RevID) FROM MyTable)
WHERE RevID < 50)
#i = #i + 1
END
I have also played around with a cursor and instead of trigger. Any idea on how to achieve this? Thanks for any input.

Okay I cam back to this because I found it interesting even though clearly there are some business rules/discussion that you and I and others are not seeing. anyway, if you want to evenly and distribute arbitrarily there are a few ways you could do it by building recursive Common Table Expressions [CTE] or by building temp tables and more. Anyway here is a way that I decided to give it a try, I did utilize 1 temp table because sql was throwing in a little inconsistency with the main logic table as a cte about every 10th time but the temp table seems to have cleared that up. Anyway, this will evenly spread RevId arbitrarily and randomly assigning any remainder (# of Records / # of RevIds) to one of the RevIds. This script also doesn't rely on having a UniqueID or anything it works dynamically over row numbers it creates..... here you go just subtract out test data etc and you have what you more than likely want. Though rebuilding the table/values would probably be easier.
--Build Some Test Data
DECLARE #Table AS TABLE (RevId VARCHAR(10))
DECLARE #C AS INT = 1
WHILE #C <= 400
BEGIN
IF #C <= 200
BEGIN
INSERT INTO #Table (RevId) VALUES ('A1')
END
IF #c <= 170
BEGIN
INSERT INTO #Table (RevId) VALUES ('B2')
END
IF #c <= 100
BEGIN
INSERT INTO #Table (RevId) VALUES ('C3')
END
IF #c <= 400
BEGIN
INSERT INTO #Table (RevId) VALUES ('D4')
END
IF #c <= 1
BEGIN
INSERT INTO #Table (RevId) VALUES ('E5')
END
SET #C = #C+ 1
END
--save starting counts of test data to temp table to compare with later
IF OBJECT_ID('tempdb..#StartingCounts') IS NOT NULL
BEGIN
DROP TABLE #StartingCounts
END
SELECT
RevId
,COUNT(*) as Occurences
INTO #StartingCounts
FROM
#Table
GROUP BY
RevId
ORDER BY
RevId
/************************ This is the main method **********************************/
--clear temp table that is the main processing logic
IF OBJECT_ID('tempdb..#RowNumsToChange') IS NOT NULL
BEGIN
DROP TABLE #RowNumsToChange
END
--figure out how many records there are and how many there should be for each RevId
;WITH cteTargetNumbers AS (
SELECT
RevId
--,COUNT(*) as RevIdCount
--,SUM(COUNT(*)) OVER (PARTITION BY 1) / COUNT(*) OVER (PARTITION BY 1) +
--CASE
--WHEN ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY NEWID()) <=
--SUM(COUNT(*)) OVER (PARTITION BY 1) % COUNT(*) OVER (PARTITION BY 1)
--THEN 1
--ELSE 0
--END as TargetNumOfRecords
,SUM(COUNT(*)) OVER (PARTITION BY 1) / COUNT(*) OVER (PARTITION BY 1) +
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY NEWID()) <=
SUM(COUNT(*)) OVER (PARTITION BY 1) % COUNT(*) OVER (PARTITION BY 1)
THEN 1
ELSE 0
END - COUNT(*) AS NumRecordsToUpdate
FROM
#Table
GROUP BY
RevId
)
, cteEndRowNumsToChange AS (
SELECT *
,SUM(CASE WHEN NumRecordsToUpdate > 1 THEN NumRecordsToUpdate ELSE 0 END)
OVER (PARTITION BY 1 ORDER BY RevId) AS ChangeEndRowNum
FROM
cteTargetNumbers
)
SELECT
*
,LAG(ChangeEndRowNum,1,0) OVER (PARTITION BY 1 ORDER BY RevId) as ChangeStartRowNum
INTO #RowNumsToChange
FROM
cteEndRowNumsToChange
;WITH cteOriginalTableRowNum AS (
SELECT
RevId
,ROW_NUMBER() OVER (PARTITION BY RevId ORDER BY (SELECT 0)) as RowNumByRevId
FROM
#Table t
)
, cteRecordsAllowedToChange AS (
SELECT
o.RevId
,o.RowNumByRevId
,ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT 0)) as ChangeRowNum
FROM
cteOriginalTableRowNum o
INNER JOIN #RowNumsToChange t
ON o.RevId = t.RevId
AND t.NumRecordsToUpdate < 0
AND o.RowNumByRevId <= ABS(t.NumRecordsToUpdate)
)
UPDATE o
SET RevId = u.RevId
FROM
cteOriginalTableRowNum o
INNER JOIN cteRecordsAllowedToChange c
ON o.RevId = c.RevId
AND o.RowNumByRevId = c.RowNumByRevId
INNER JOIN #RowNumsToChange u
ON c.ChangeRowNum > u.ChangeStartRowNum
AND c.ChangeRowNum <= u.ChangeEndRowNum
AND u.NumRecordsToUpdate > 0
IF OBJECT_ID('tempdb..#RowNumsToChange') IS NOT NULL
BEGIN
DROP TABLE #RowNumsToChange
END
/***************************** End of Main Method *******************************/
-- Compare the results and clean up
;WITH ctePostUpdateResults AS (
SELECT
RevId
,COUNT(*) as AfterChangeOccurences
FROM
#Table
GROUP BY
RevId
)
SELECT *
FROM
#StartingCounts s
INNER JOIN ctePostUpdateResults r
ON s.RevId = r.RevId
ORDER BY
s.RevId
IF OBJECT_ID('tempdb..#StartingCounts') IS NOT NULL
BEGIN
DROP TABLE #StartingCounts
END

Since you've given no rules for how you'd like the balance to operate we're left to speculate. Here's an approach that would find the most overrepresented value and then find an underrepresented value that can take on the entire overage.
I have no idea how optimal this is and it will probably run in an infinite loop without more logic.
declare #balance int = 125;
declare #cnt_over int;
declare #cnt_under int;
declare #revID_overrepresented varchar(32);
declare #revID_underrepresented varchar(32);
declare #rowcount int = 1;
while #rowcount > 0
begin
select top 1 #revID_overrepresented = RevID, #cnt_over = count(*)
from T
group by RevID
having count(*) > #balance
order by count(*) desc
select top 1 #revID_underrepresented = RevID, #cnt_under = count(*)
from T
group by RevID
having count(*) < #balance - #cnt_over
order by count(*) desc
update top #cnt_over - #balance T
set RevId = #revID_underrepresented
where RevId = #revID_overrepresented;
set #rowcount = ##rowcount;
end

The problem is I don't even know what you mean by balance...You say it needs to be evenly represented but it seems like you want it to be 125. 125 is not "even", it is just 125.
I can't tell what you are trying to do, but I'm guessing this is not really an SQL problem. But you can use SQL to help. Here is some helpful SQL for you. You can use this in your language of choice to solve the problem.
Find the rev values and their counts:
SELECT RevID, COUNT(*)
FROM MyTable
GROUP BY MyTable
Update #X rows (with RevID of value #RevID) to a new value #NewValue
UPDATE TOP #X FROM MyTable
SET RevID = #NewValue
WHERE RevID = #RevID
Using these two queries you should be able to apply your business rules (which you never specified) in a loop or whatever to change the data.

How to optimize SQL Server code?

I have a table with the columns: Id, time, value.
First step: Given input parameters as signal id, start time and end time, I want to first extract rows with the the signal id and time is between start time and end time.
Second: Assume I have selected 100 rows in the first step. Given another input parameter which is max_num, I want to further select max_num samples out of 100 rows but in a uniform manner. For example, if max_num is set to 10, then I will select 1, 11, 21, .. 91 rows out of 100 rows.
I am not sure if the stored procedure below is optimal, if you find any inefficiencies of the code, please point that out to me and give some suggestion.
create procedure data_selection
#sig_id bigint,
#start_time datetime2,
#end_time datetime2,
#max_num float
AS
BEGIN
declare #tot float
declare #step int
declare #selected table (id int primary key identity not null, Date datetime2, Value real)
// first step
insert into #selected (Date, Value) select Date, Value from Table
where Id = #sig_id
and Date > = #start_time and Date < = #end_time
order by Date
// second step
select #tot = count(1) from #selected
set #step = ceiling(#tot / #max_num)
select * from #selected
where id % #step = 1
END

EDITED to calculate step on the fly. I had first thought this was an argument.
;with data as (
select row_number() over (order by [Date]) as rn, *
from Table
where Id = #sig_id and Date between #start_time and #end_time
), calc as (
select cast(ceiling(max(rn) / #max_num) as int) as step from data
)
select * from data cross apply calc as c
where (rn - 1) % step = 0 --and rn <= (#max_num - 1) * step + 1
Or I guess you can just order/filter by your identity value as you already had it:
;with calc as (select cast(ceiling(max(rn) / #max_num) as int) as step from #selected)
select * from #selected cross apply calc as c
where (id - 1) % step = 0 --and id <= (#max_num - 1) * step + 1
I think that because you're rounding step up with ceiling you'll easily find scenarios where you get fewer rows than #max_num. You might want to round down instead: case when floor(max(rn) / #max_num) = 0 then 1 else floor(max(rn) / #max_num) end as step?

Create evenly spaced out sequence in SQL

So this should be fairly simple, and I'm sure there's an embarrassingly easily solution I'm missing, but here goes:
I want to create a grid of numbers based on two numeric variables.
More specifically, I want to select the 5th and 95th percentile of each variable, then cut up the difference between those two values into 100 parts, and then group by those.
So basically what I need is in pseudocode
(5th percentile)+(95th percentile-5th percentile)/100*[all numbers from 0 to 100]
I can pick out the 5th and 95th percentile with the following query:
SELECT Min(subq.lat) as latitude, percentile FROM
(SELECT round(latitude,2) as lat, ntile(100) OVER (order by latitude desc) as
'percentile' FROM table ORDER BY latitude DESC) AS subq
where percentile in (5,95)
group by 2
And I can can create a list of numbers from 0 to 100 as well.
But how to combine those two is something that's a little beyond me.
Help would be much appreciated.

I'm not entirely sure I follow what you're after, but it could be as simple as looping through 1-100, performing your calculation for each set and inserting them into a results table:
CREATE TABLE #Results (Counter_ INT, Calc_Value FLOAT)
GO
DECLARE #intFlag INT
SET #intFlag = 1
WHILE (#intFlag <=100)
BEGIN
--Do Stuff
INSERT INTO #Results
SELECT Counter_ = #intFlag
,Calc_Value = (calculation logic)/#intFlag
SET #intFlag = #intFlag + 1
END
GO
The 'Do Stuff' portion gets executed for each value 1 - 100, the (Calculation logic) would obviously need to be replaced with whatever logic you use. If those values are constant for 1-100, you could set them as variables so they don't have to run 100 times. Roughly:
CREATE TABLE #Results (Counter_ INT, Calc_Value FLOAT)
GO
DECLARE #intFlag INT, #Percentile_Value FLOAT = (Calculation Logic)
SET #intFlag = 1
WHILE (#intFlag <=100)
BEGIN
--Do Stuff
INSERT INTO #Results
SELECT Counter_ = #intFlag
,Calc_Value = #Percentile_Value/#intFlag
SET #intFlag = #intFlag + 1
END
GO

You can do what you want to do with window functions. Basically, do the percentile calculation with row_number() and the total count.
Here is an example:
SELECT lat,
((seqnum - 0.05 * cnt) / (0.95 * cnt - 0.05 * cnt)) * 100 as NewPercentile
FROM (SELECT round(latitude,2) as lat,
row_number() over (order by latitude) as seqnum,
count(*) over () as cnt
FROM table
ORDER BY latitude DESC
) AS subq
where seqnum between 0.05 * cnt and 0.95 * cnt

Is there a way to split the results of a select query into two equal halfs?

I need a solution for a select query in Sql Server 2005.
I'd like to have a query returning two ResultSets each of which holding exactly half of all records matching a certain criteria. I tried using TOP 50 PERCENT in conjunction with an Order By but if the number of records in the table is odd, one record will show up in both resultsets. I don't want to have any record duplicated over the recordsets. Example:
I've got a simple table with TheID (PK) and TheValue fields (varchar(10)) and 5 records. Skip the where clause for now.
SELECT TOP 50 PERCENT * FROM TheTable ORDER BY TheID asc
results in the selected id's 1,2,3
SELECT TOP 50 PERCENT * FROM TheTable ORDER BY TheID desc
results in the selected id's 3,4,5
3 is a dup. In real life of course the queries are fairly complicated with a ton of where clauses and subqueries.

SQL Server 2005 and similar:
select *, ntile(2) over(order by theid) as tile_nr from thetable
ntile(n) allocates the output into n segments, each of the same size (give or take rounding when the number of rows isn't divisible by n). So this produces the output:
1 | value1 | 1
2 | value2 | 1
3 | value3 | 1
4 | value4 | 2
5 | value5 | 2
If you just want the top or bottom half, you need to put this into a subquery, e.g.:
select theid, thevalue from (
select theid, thevalue, ntile(2) over(order by theid) as tile_nr from thetable
) x
where x.tile_nr = 1
will return the top half, and similarly use x.tile_nr = 2 for the bottom half

You could use these two queries:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY TheID) AS rn FROM TheTable
) T1
WHERE rn % 2 = 0
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY TheID) AS rn FROM TheTable
) T1
WHERE rn % 2 = 1

If this is SQL Server 2000, then I'd be inclined to find the PK of the middle value like so:
Declare #MiddleId int
Set #MiddleId = (
Select TOP 1 PK
From (
Select TOP 50 PERCENT PK
From Table
Order By TheId ASC
)
Order By TheId DESC
)
Select ...
From Table
Where TheId <= #MiddleId
Select ..
From Table
Where TheId > #MiddleId
With SQL Server 2005, I'd be inclined to do the same but you can use a CTE
;With NumProjects As
(
Select Id, ROW_NUMBER() OVER (ORDER BY TheId ASC ) As Num
From Table
)
Select #MiddleId = Id
From Table
Where Num = CEILING( (Select Count(*) From Table) / 2 )

try this:
DECLARE #CountOf int,#Top int,#Bottom int
SELECT #CountOf=COUNT(*) FROM YourTable
SET #Top=#CountOf/2
SET #Bottom=#CountOf-#Top
SELECT TOP (#Top) * FROM YourTable ORDER BY 1 asc --assumes column 1 is your PK
SELECT TOP (#Bottom) * FROM YourTable ORDER BY 1 desc --assumes column 1 is your PK

Here is another solution:
You would need to use a temp table to hold the first 50% as below:
select top 50 percent *
into #YourTempTable
from TheTable
-- The below would give the first half
select * from #YourTempTable
-- The below woud give rest of the half
select * from TheTable where TheID not in (select TheID from #YourTempTable)

This is the query I found useful (after modifications of-course):
DECLARE #numberofitemsperpage INT DECLARE #numberofpages INT DECLARE #currentpage int
DECLARE #countRecords float SET #countRecords = (Select COUNT(*) From
sz_hold_visitsData)
-- the Excel can hold approximately ONE MILLION records at a time. if #countRecords >= 1000000 SET #numberofitemsperpage = 500000 ELSE IF
#countRecords < 1000000 AND #countRecords >= 500000 SET
#numberofitemsperpage = 250000 ELSE IF #countRecords < 500000 AND
#countRecords >= 100000 SET #numberofitemsperpage = 50000 ELSE SET
#numberofitemsperpage = 10000
DECLARE #numberofpages_deci float SET #numberofpages_deci =
#countRecords / #numberofitemsperpage
SET #numberofpages = CEILING(#numberofpages_deci) Select
#countRecords AS countRecords, #numberofitemsperpage AS
numberofitemsperpage , #numberofpages_deci AS numberofpages_deci,
#numberofpages AS numberofpagesFnl
SET #currentpage =0 WHILE #currentpage < #numberofpages BEGIN SELECT
a.* FROM (SELECT row_number() OVER (ORDER BY person_ID) AS ROW, *
FROM sz_hold_visitsData) a WHERE ROW >= #currentpage *
#numberofitemsperpage +1 AND Row <= (#currentpage+1) *
#numberofitemsperpage
IF ##ROWCOUNT = 0 BREAK SET #currentpage = #currentpage +1 END
In this extract, "sz_hold_visitsData" is a table in my database, whilst "person_ID" is a column therein.
You can also further modify the script to output to file:
DECLARE #numberofitemsperpage INT DECLARE #numberofpages INT DECLARE
#currentpage int
DECLARE #countRecords float SET #countRecords = (Select COUNT(*) From
sz_hold_visitsData)
-- the Excel can hold approximately ONE MILLION records at a time. if #countRecords >= 1000000 SET #numberofitemsperpage = 500000 ELSE IF
#countRecords < 1000000 AND #countRecords >= 500000 SET
#numberofitemsperpage = 250000 ELSE IF #countRecords < 500000 AND
#countRecords >= 100000 SET #numberofitemsperpage = 50000 ELSE SET
#numberofitemsperpage = 10000
DECLARE #numberofpages_deci float SET #numberofpages_deci =
#countRecords / #numberofitemsperpage
SET #numberofpages = CEILING(#numberofpages_deci) Select
#countRecords AS countRecords, #numberofitemsperpage AS
numberofitemsperpage , #numberofpages_deci AS numberofpages_deci,
#numberofpages AS numberofpagesFnl
DECLARE #sevrName nvarchar(50) SET #sevrName = '.\sql14' DECLARE
#outputFile nvarchar(500)
SET #currentpage =0 WHILE #currentpage < #numberofpages BEGIN
--SELECT a.* FROM (SELECT row_number() OVER (ORDER BY person_ID) AS ROW, * FROM sz_hold_visitsData) a WHERE ROW >= #currentpage *
#numberofitemsperpage +1 AND Row <= (#currentpage+1) *
#numberofitemsperpage SET #outputFile = 'C:\PSM\outVisits_'
+convert(nvarchar(50), #currentpage) + '.csv' --Select #outputFile --TEST
DECLARE #cmd_ varchar(500) = 'sqlcmd -S ' + #sevrName + ' -E -Q
"SELECT a.* FROM (SELECT row_number() OVER (ORDER BY person_ID) AS
ROW, * FROM sz_hold_visitsData) a WHERE ROW >= '+
CONVERT(nvarchar(500),#currentpage * #numberofitemsperpage +1) +' AND
Row <= ' + CONVERT(nvarchar(500),((#currentpage+1) *
#numberofitemsperpage)) +'" -s "," -o ' +#outputFile +' ' --
"C:\PSM\outVisits.csv" ' EXEC xp_cmdshell #cmd_
IF ##ROWCOUNT = 0 BREAK SET #currentpage = #currentpage +1 END
Hope helps.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Data Sampling - sql

SQL server has in-built function for it. SELECT FirstName, LastName FROM Person.Person TABLESAMPLE (10 PERCENT) ;

Related

Selecting data from table where sum of values in a column equal to the value in another column

SQL Server - loop through table and update based on count

How to optimize SQL Server code?

Create evenly spaced out sequence in SQL

Is there a way to split the results of a select query into two equal halfs?

Categories

Resources