Could someone please advise on how to repeat the query if it returned no results. I am trying to generate a random person out of the DB using RAND, but only if that number was not used previously (that info is stored in the column "allready_drawn").
At this point when the query comes over the number that was drawn before, because of the second condition "is null" it does not display a result.
I would need for query to re-run once again until it comes up with a number.
DECLARE #min INTEGER;
DECLARE #max INTEGER;
set #min = (select top 1 id from [dbo].[persons] where sector = 8 order by id ASC);
set #max = (select top 1 id from [dbo].[persons] where sector = 8 order by id DESC);
select
ordial,
name_surname
from [dbo].[persons]
where id = ROUND(((#max - #min) * RAND() + #min), 0) and allready_drawn is NULL
The results (two possible outcomes):
Any suggestion is appreciated and I would like to thank everyone in advance.
Just try this to remove the "id" filter so you only have to run it once
select TOP 1
ordial,
name_surname
from [dbo].[persons]
where allready_drawn is NULL
ORDER BY NEWID()
#gbn that's a correct solution, but it's possible it's too expensive. For very large tables with dense keys, randomly picking a key value between the min and max and re-picking until you find a match is also fair, and cheaper than sorting the whole table.
Also there's a bug in the original post, as the min and max rows will be selected only half as often as the others, as each maps to a smaller interval. To fix generate a random number from #min to #max + 1, and truncate, rather than round. That way you map the interval [N,N+1) to N, ensuring a fair chance for each N.
For this selection method, here's how to repeat until you find a match.
--drop table persons
go
create table persons(id int, ordial int, name_surname varchar(2000), sector int, allready_drawn bit)
insert into persons(id,ordial,name_surname,sector, allready_drawn)
values (1,1,'foo',8,null),(2,2,'foo2',8,null),(100,100,'foo100',8,null)
go
declare #min int = (select top 1 id from [dbo].[persons] where sector = 8 order by id ASC);
declare #max int = 1+ (select top 1 id from [dbo].[persons] where sector = 8 order by id DESC);
set nocount on
declare #results table(ordial int, name_surname varchar(2000))
declare #i int = 0
declare #selected bit = 0
while #selected = 0
begin
set #i += 1
insert into #results(ordial,name_surname)
select
ordial,
name_surname
from [dbo].[persons]
where id = ROUND(((#max - #min) * RAND() + #min), 0, 1) and allready_drawn is NULL
if ##ROWCOUNT > 0
begin
select *, #i tries from #results
set #selected = 1
end
end
Related
I am using MS SQL Server 2005 at work to build a database. I have been told that most tables will hold 1,000,000 to 500,000,000 rows of data in the near future after it is built... I have not worked with datasets this large. Most of the time I don't even know what I should be considering to figure out what the best answer might be for ways to set up schema, queries, stuff.
So... I need to know the start and end dates for something and a value that is associated with in ID during that time frame. SO... we can the table up two different ways:
create table xxx_test2 (id int identity(1,1), groupid int, dt datetime, i int)
create table xxx_test2 (id int identity(1,1), groupid int, start_dt datetime, end_dt datetime, i int)
Which is better? How do I define better? I filled the first table with about 100,000 rows of data and it takes about 10-12 seconds to set up in the format of the second table depending on the query...
select y.groupid,
y.dt as [start],
z.dt as [end],
(case when z.dt is null then 1 else 0 end) as latest,
y.i
from #x as y
outer apply (select top 1 *
from #x as x
where x.groupid = y.groupid and
x.dt > y.dt
order by x.dt asc) as z
or
http://consultingblogs.emc.com/jamiethomson/archive/2005/01/10/t-sql-deriving-start-and-end-date-from-a-single-effective-date.aspx
Buuuuut... with the second table.... to insert a new row, I have to go look and see if there is a previous row and then if so update its end date. So... is it a question of performance when retrieving data vs insert/update things? It seems silly to store that end date twice but maybe...... not? What things should I be looking at?
this is what i used to generate my fake data... if you want to play with it for some reason (if you change the maximum of the random number to something higher it will generate the fake stuff a lot faster):
declare #dt datetime
declare #i int
declare #id int
set #id = 1
declare #rowcount int
set #rowcount = 0
declare #numrows int
while (#rowcount<100000)
begin
set #i = 1
set #dt = getdate()
set #numrows = Cast(((5 + 1) - 1) *
Rand() + 1 As tinyint)
while #i<=#numrows
begin
insert into #x values (#id, dateadd(d,#i,#dt), #i)
set #i = #i + 1
end
set #rowcount = #rowcount + #numrows
set #id = #id + 1
print #rowcount
end
For your purposes, I think option 2 is the way to go for table design. This gives you flexibility, and will save you tons of work.
Having the effective date and end date will allow you to have a query that will only return currently effective data by having this in your where clause:
where sysdate between effectivedate and enddate
You can also then use it to join with other tables in a time-sensitive way.
Provided you set up the key properly and provide the right indexes, performance (on this table at least) should not be a problem.
for anyone who can use LEAD Analytic function of SQL Server 2012 (or Oracle, DB2, ...), retrieving data from the 1st table (that uses only 1 date column) would be much much quicker than without this feature:
select
groupid,
dt "start",
lead(dt) over (partition by groupid order by dt) "end",
case when lead(dt) over (partition by groupid order by dt) is null
then 1 else 0 end "latest",
i
from x
I have the following TSQL Statement, I am trying to figure out how I can keep getting the results (100 rows at a time), store them in a variable (as I will have to add the totals after each select) and continue to select in a while loop until no more records are found and then return the variable totals to the calling function.
SELECT [OrderUser].OrderUserId, ISNULL(SUM(total.FileSize), 0), ISNULL(SUM(total.CompressedFileSize), 0)
FROM
(
SELECT DISTINCT TOP(100) ProductSize.OrderUserId, ProductSize.FileInfoId,
CAST(ProductSize.FileSize AS BIGINT) AS FileSize,
CAST(ProductSize.CompressedFileSize AS BIGINT) AS CompressedFileSize
FROM ProductSize WITH (NOLOCK)
INNER JOIN [Version] ON ProductSize.VersionId = [Version].VersionId
) AS total RIGHT OUTER JOIN [OrderUser] WITH (NOLOCK) ON total.OrderUserId = [OrderUser].OrderUserId
WHERE NOT ([OrderUser].isCustomer = 1 AND [OrderUser].isEndOrderUser = 0 OR [OrderUser].isLocation = 1)
AND [OrderUser].OrderUserId = 1
GROUP BY [OrderUser].OrderUserId
Depending on the clustered index, if its by numbered id, then use the code below. If its by date, go in 10 - 60 minute increments. keep an eye on performance of other things, but the lovely part of this code is you can start and stop at anytime if you push the results to permanent temp table (real table, just temp)
Here's a sample:
declare #count int
Declare #batch int
declare #max int
create table #temp (id int identity(1,1) primary key, Batch int, value int)
select #max = max(OrderUserId), #count = 0, #batch = 1000 from table
while (#count < #max)
begin
insert into #temp (batch,value)
select #count, Sum(stuffs)
from table
where orderId >= #count
and orderid < #count + #batch
Set #count = #count + #batch
waitfor delay ('00:00:01')
Raiserror('On Batch %d',0,1,#Count) with nowait /* Will print progess */
end
I have a query that returns a large number of heavy rows.
When I transform this rows in a list of CustomObject I have a big memory peak, and this transformation is made by a custom dotnet framework that I can't modify.
I need to retrieve a less number of rows to do "the transform" in two passes and then avoid the memory peak.
How can I split the result of a query by half? I need to do it in DB layer. I thing to do a "Top count(*)/2" but how to get the other half?
Thank you!
If you have identity field in the table, select first even ids, then odd ones.
select * from Table where Id % 2 = 0
select * from Table where Id % 2 = 1
You should have roughly 50% rows in each set.
Here is another way to do it from(http://www.tek-tips.com/viewthread.cfm?qid=1280248&page=5). I think it's more efficient:
Declare #Rows Int
Declare #TopRows Int
Declare #BottomRows Int
Select #Rows = Count(*) From TableName
If #Rows % 2 = 1
Begin
Set #TopRows = #Rows / 2
Set #BottomRows = #TopRows + 1
End
Else
Begin
Set #TopRows = #Rows / 2
Set #BottomRows = #TopRows
End
Set RowCount #TopRows
Select * From TableName Order By DisplayOrder
Set RowCount #BottomRows
Select * From TableNameOrder By DisplayOrderDESC
--- old answer below ---
Is this a stored procedure call or dynamic sql? Can you use temp tables?
if so, something like this would work
select row_number() OVER(order by yourorderfield) as rowNumber, *
INTO #tmp
FROM dbo.yourtable
declare #rowCount int
SELECT #rowCount = count(1) from #tmp
SELECT * from #tmp where rowNumber <= #rowCount / 2
SELECT * from #tmp where rowNumber > #rowCount / 2
DROP TABLE #tmp
SELECT TOP 50 PERCENT WITH TIES ... ORDER BY SomeThing
then
SELECT TOP 50 PERCENT ... ORDER BY SomeThing DESC
However, unless you snapshot the data first, a row in the middle may slip through or be processed twice
I don't think you should do that in SQL, unless you will always have a possibility to have the same record 2 times.
I would do it in an "software" programming language, not SQL. Java, .NET, C++, etc...
I have the following problem, that I would like to solve with transact-sql.
I have something like this
Start | End | Item
1 | 5 | A
3 | 8 | B
and I want to create something like
Start | End | Item-Combination
1 | 2 | A
3 | 5 | A-B
6 | 8 | B
For the Item-Combination concatenation I already thought of using the FOR XML statement. But in order to create the different new intervals... I really don't know how to approach it. Any idea?
Thanks.
I had a very similar problem with some computer usage data. I had session data indicating login/logout times. I wanted to find the times (hour of day per day of week) that were the most in demand, that is, the hours where the most users were logged in. I ended up solving the problem client-side using hash tables. For each session, I would increment the bucket for a particular location corresponding to the day of week and hour of day for each day/hour for which the session was active. After examining all sessions the hash table values show the number of logins during each hour for each day of the week.
I think you could do something similar, keeping track of each item seen for each start/end value. You could then reconstruct the table by collapsing adjacent entries that have the same item combination.
And, no, I could not think of a way to solve my problem with SQL either.
This is a fairly typical range-finding problem, with the concatenation thrown in. Not sure if the following fits exactly, but it's a starting point. (Cursors are usually best avoided except in the small set of cases where they are faster than set-based solutions, so before the cursor haters get on me please note I use a cursor here on purpose because this smells to me like a cursor-friendly problem -- I typically avoid them.)
So if I create data like this:
CREATE TABLE [dbo].[sourceValues](
[Start] [int] NOT NULL,
[End] [int] NOT NULL,
[Item] [varchar](100) NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[sourceValues] WITH CHECK ADD CONSTRAINT [End_after_Start] CHECK (([End]>[Start]))
GO
ALTER TABLE [dbo].[sourceValues] CHECK CONSTRAINT [End_after_Start]
GO
declare #i int; set #i = 0;
declare #start int;
declare #end int;
declare #item varchar(100);
while #i < 1000
begin
set #start = ABS( CHECKSUM( newid () ) % 100 ) + 1 ; -- "random" int
set #end = #start + ( ABS( CHECKSUM( newid () ) % 10 ) ) + 2; -- bigger random int
set #item = char( ( ABS( CHECKSUM( newid() ) ) % 5 ) + 65 ); -- random letter A-E
print #start; print #end; print #item;
insert into sourceValues( Start, [End], Item) values ( #start , #end, #item );
set #i += 1;
end
Then I can treat the problem like this: each "Start" AND each "End" value represents a change in the collection of current Items, either adding one or removing one, at a certain time. In the code below I alias that notion as "event," meaning an Add or Remove. Each start or end is like a time, so I use the term "tick." If I make a collection of all the events, ordered by event time (Start AND End), I can iterate through it while keeping a running tally in an in-memory table of all the Items that are in play. Each time the tick value changes, I take a snapshot of that tally:
declare #tick int;
declare #lastTick int;
declare #event varchar(100);
declare #item varchar(100);
declare #concatList varchar(max);
declare #currentItemsList table ( Item varchar(100) );
create table #result ( Start int, [End] int, Items varchar(max) );
declare eventsCursor CURSOR FAST_FORWARD for
select tick, [event], item from (
select start as tick, 'Add' as [event], item from sourceValues as adds
union all
select [end] as tick, 'Remove' as [event], item from sourceValues as removes
) as [events]
order by tick
set #lastTick = 1
open eventsCursor
fetch next from eventsCursor into #tick, #event, #item
while ##FETCH_STATUS = 0
BEGIN
if #tick != #lastTick
begin
set #concatList = ''
select #concatList = #concatlist + case when len( #concatlist ) > 0 then '-' else '' end + Item
from #currentItemsList
insert into #result ( Start, [End], Items ) values ( #lastTick, #tick, #concatList )
end
if #event = 'Add' insert into #currentItemsList ( Item ) values ( #item );
else if #event = 'Remove' delete top ( 1 ) from #currentItemsList where Item = #item;
set #lastTick = #tick;
fetch next from eventsCursor into #tick, #event, #item;
END
close eventsCursor
deallocate eventsCursor
select * from #result order by start
drop table #result
Using a cursor for this special case allows just one "pass" through the data, like a running totals problem. Itzik Ben-Gan has some great examples of this in his SQL 2005 books.
Thanks a lot for all the answers, for the moment I have found a way of doing it. SInce I'm dealing with a datawarehouse, and I have a Time dimension, I could do some joins with Time dimension in the style"inner join DimTime t on t.date between f.start_date and end_date".
It's not very good from the performance point of view, but it seems it's working for me.
I'll give a try to onupdatecascade implementation, to see which suits better for me.
This will exactly emulates and solves the mentioned problem:
-- prepare problem, it can have many rows with overlapping ranges
declare #range table
(
Item char(1) primary key,
[Start] int,
[End] int
)
insert #range select 'A', 1, 5
insert #range select 'B', 3, 8
-- unroll the ranges into helper table
declare #usage table
(
Item char(1),
Number int
)
declare
#Start int,
#End int,
#Item char(1)
declare table_cur cursor local forward_only read_only for
select [Start], [End], Item from #range
open table_cur
fetch next from table_cur into #Start, #End, #Item
while ##fetch_status = 0
begin
with
Num(Pos) as -- generate numbers used
(
select cast(#Start as int)
union all
select cast(Pos + 1 as int) from Num where Pos < #End
)
insert
#usage
select
#Item,
Pos
from
Num
option (maxrecursion 0) -- just in case more than 100
fetch next from table_cur into #Start, #End, #Item
end
close table_cur
deallocate table_cur
-- compile overlaps
;
with
overlaps as
(
select
Number,
(
select
Item + '-'
from
#usage as i
where
o.Number = i.Number
for xml path('')
)
as Items
from
#usage as o
group by
Number
)
select
min(Number) as [Start],
max(Number) as [End],
left(Items, len(Items) - 1) as Items -- beautify
from
overlaps
group by
Items
I use ROW_NUMBER() to do paging with my website content and when you hit the last page it timeout because the SQL Server takes too long to complete the search.
There's already an article concerning this problem but seems no perfect solution yet.
http://weblogs.asp.net/eporter/archive/2006/10/17/ROW5F00NUMBER28002900-OVER-Not-Fast-Enough-With-Large-Result-Set.aspx
When I click the last page of the StackOverflow it takes less a second to return a page, which is really fast. I'm wondering if they have a real fast database servers or just they have a solution for ROW_NUMBER() problem?
Any idea?
Years back, while working with Sql Server 2000, which did not have this function, we had the same issue.
We found this method, which at first look seems like the performance can be bad, but blew us out the water.
Try this out
DECLARE #Table TABLE(
ID INT PRIMARY KEY
)
--insert some values, as many as required.
DECLARE #I INT
SET #I = 0
WHILE #I < 100000
BEGIN
INSERT INTO #Table SELECT #I
SET #I = #I + 1
END
DECLARE #Start INT,
#Count INT
SELECT #Start = 10001,
#Count = 50
SELECT *
FROM (
SELECT TOP (#Count)
*
FROM (
SELECT TOP (#Start + #Count)
*
FROM #Table
ORDER BY ID ASC
) TopAsc
ORDER BY ID DESC
) TopDesc
ORDER BY ID
The base logic of this method relies on the SET ROWCOUNT expression to both skip the unwanted rows and fetch the desired ones:
DECLARE #Sort /* the type of the sorting column */
SET ROWCOUNT #StartRow
SELECT #Sort = SortColumn FROM Table ORDER BY SortColumn
SET ROWCOUNT #PageSize
SELECT ... FROM Table WHERE SortColumn >= #Sort ORDER BY SortColumn
The issue is well covered in this CodeProject article, including scalability graphs.
TOP is supported on SQL Server 2000, but only static values. Eg no "TOP (#Var)", only "TOP 200"