Is there a good way to create a queue like structure in SQL Server?
Requirements:
When I insert rows, I want them to default to the bottom of the queue
When I select rows, I want to easily be able to get the top of the queue
Here's the tough one: I want to be able to easily move something up the queue, and reorient the rest. Example: move item 5 up to number 1, then 1-4 becomes 2-5
A simple identity column would work for requirements 1 and 2, but how would I handle 3?
Solution
I ended up implementing the solution from #roger-wolf
One difference, I used a trigger rather than a stored procedure to renumber. Here's my trigger code:
CREATE TRIGGER [dbo].[TR_Queue]
ON [dbo].[Queue]
AFTER INSERT, DELETE, UPDATE
AS
BEGIN
SET NOCOUNT ON;
-- Get the current max value in priority
DECLARE #maxPriority INT = COALESCE((SELECT MAX([priority]) FROM [dbo].[Queue]), 0);
WITH newValues AS (
-- Renumber by priority, starting at 1
SELECT [queueID]
,ROW_NUMBER() OVER(ORDER BY [priority] ASC) AS [priority]
FROM (
-- Pretend all nulls are greater than previous max priority
SELECT [queueID]
,COALESCE([priority], #maxPriority+1) AS [priority]
FROM [dbo].[Queue]
) AS tbl
)
UPDATE q
SET q.[priority] = newValues.[priority]
FROM [dbo].[Queue] AS qroger-wolf
INNER JOIN newValues
ON q.[queueID] = newValues.[queueID]
END
This works well for me as the queue is always relatively small and infrequently updated, so I don't have to work about performance of the trigger.
Use a float column for prioritisation and an approach similar to Celko trees:
If you have items with priorities 1, 2, and 3 and the last needs to become second, calculate an average between its new neighbours, 1.5 in this example;
If another one needs to become second, its priority would be 1.25. This can go on for quite a while;
When displaying queued items by their priority, use row_number() instead of float values in UI;
If items become too close together (say, 1e-10 or less), have a stored procedure ready to renumber them as integers.
The only deficiency I see here is that it becomes a bit more difficult to find N-th item in a middle of a queue, when it's neither first nor last. If you don't need that, the approach should work.
You could add a Priority column of type DateTime, and when you set a row as a priority row you set the current date-time in the Priority column and then use that as part of your order by criteria?
I had a similar requirement in a past project, what I did (and it worked):
Add column update_at_utc of type datetime2
When inserting, set update_at_utc = GETDATEUTC()
When retrieving, order by update_at_utc
When moving a row in the queue, for example between rows 3 and 4, simply take average of update_at_utc of these rows and use it to set update_at_utc of the row being moved.
Note 1: Point 4 assumes that the frequency of inserts and of moving the rows up/down the queue is such that datetime2 type has sufficient resolution. For example, if you insert 2 rows 1 millisecond apart, and then try to move 1000 rows between these 2 rows, then datetime2 resolution will be insufficient (https://learn.microsoft.com/en-us/sql/t-sql/data-types/datetime2-transact-sql?view=sql-server-2017). In such case, the moving of rows up/down the queue would need to be more complicated; When moving a row N places lower down:
Remember update_at_utc of the row N places lower down
For all rows between the current and the new position: assign row's update_at_utc to the preceding row's update_at_utc
Assign update_at_utc of the row being moved to the date remembered in point 1 above.
Note 2: I suggest UTC dates instead of local dates to avoid issues during a daylight saving switch.
Related
I need to display a random last name of a person who entered into an employment contract in a specified month using the rand function
go
CREATE OR ALTER function [dbo].[User_Surname]
(#mont int)
returns nvarchar(50)
begin
Declare #surname nvarchar(50)
Set #surname = (Select top(1) surname from dbo.Tenants
inner join dbo.lease_agreements on Tenants.tenant_code = lease_agreements.tenant_code
where MONTH(lease_agreements.rental_start_date) = #mont and dbo.Tenants.tenant_code = (select * from randNumber))
return #surname
end
go
select dbo.User_Surname (1)
create or alter view randNumber as
Select FLOOR((RAND() * (MAX(tenant_code + 1) - 1)) + 1) as value from Tenants
So what if tenant #42 has been removed? If the random number function returns 42, then your query will yield nothing.
To fix this problem, an approach which would be quite difficult to correctly implement would involve a row-sequence-number column which is an integer which sequentially increments and does not contain gaps. In order to avoid a gap when a row is deleted, you must pick the last row from the table and give it the row-sequence-number of the deleted column. Consistently doing so without ever forgetting to do it seems like a tough proposition. Achieving this without concurrency problems when rows are being concurrently deleted also seems like a tough proposition. Furthermore, the possibility that the last row may be re-sequenced means that you cannot use an SQL SEQUENCE for issuing row sequence numbers, or that your RDBMS must support the ability to count-down on a sequence, which is a tough proposition.
A better approach would be to create a random number N between zero and the number of rows instead of the maximum row id number, and then to pick the Nth row from the table. That would be something like SELECT BOTTOM 1 FROM (SELECT TOP N FROM...
An SQL-only solution (involving no stored procedures) would be very inefficient. It would involve joining the table of interest with the random-number function, (just real random numbers between 0.0 and 1.0,) essentially creating a new table which also contains a random number field, then using ORDER BY the random field, and then using TOP 1 to get the first row. To achieve this, your RDBMS would be performing a full table scan, and creating an entire new sorted temporary table, and it would be doing that each time you ask for a row at random, so it would be preposterously inefficient.
A performance improvement on the above idea would be to permanently add the random number column to each row, (and to issue a new random number between 0.0 and 1.0 to each row later inserted,) and then use a SEQUENCE for issuing sequential row index numbers, so that each time you want a new random row you pick the next number N from the sequence, you compute its modulus by the number of rows in the table, and you get the Nth row from the table sorted by random-number-column. It will probably be a good idea to make that random number column indexed. The problem with this approach is that it does not truly yield records at random, it yields all records in random order. Truly yielding records at random means that the same row might be yielded twice in two successive queries. This approach will only yield a record again once all other records have first been yielded.
As you want only one tenant, use 'ORDEr BY RAND()`
As always with randomness, you could alos get 100 times the same Tennant, especially when you have only a small number of tennants that fit the bill.
This will never be fast as the table needs to be full scnanned
but at least you should have an index on (tenant_code ,rental_start_date) so that it will be faster to select the correct tennants
CREATE OR ALTER function [dbo].[User_Surname]
(#mont int)
returns nvarchar(50)
begin
Declare #surname nvarchar(50)
Set #surname = (Select top(1) surname from dbo.Tenants
inner join dbo.lease_agreements on Tenants.tenant_code = lease_agreements.tenant_code
where MONTH(lease_agreements.rental_start_date) = #mont
ORDER By RAND())
return #surname
end
My requirement is as follows.
(a) I have already sequence created and one table is (lets assume employee having id,name..etc).
(b) Some how my sequence get corrupted and currently current value of sequence is not sync with the max value of id column of employee table.
now i want to reset my sequence to the max value of the id column of employee table. i know we can do it easily by using PL/SQL,Stored procedure. but i want to write plain query which will do following tasks.
1- Fetch max value of id and current value of my sequence .Take a difference and add that difference to the sequence by using increment by.( here my current value of sequence is lesser than max id value of id column)
You change the values of a sequence with the 'ALTER SEQUENCE' command.
To restart the sequence with a new base value, you need to drop and recreate it.
I do not think you can do this with a straightforward SELECT query.
Here is the Oracle 10g documentation for ALTER SEQUENCE.
You can't change the increment from plain SQL as alter sequence is DDL, so you need to increment it multiple times, one by one. This would increment the sequence as many times as the highest ID you currently have:
select your_sequence.nextval
from (
select max(id) as max_id
from your_table
) t
connect by level < t.max_id;
SQL Fiddle demo (fudged a bit as the sequence isn't reset if the schema is cached).
If you have a high value though that might be inefficient, though as a one-off adjustment that probably doesn't matter. You can't refer to the current sequence value in a subquery or CTE, but you could look at the USER_SEQUNECES view to get a rough guide of how far out you are to begin with, and reduce the number of calls to within double the case size (depending on how many waiting values the cache holds):
select your_sequence.nextval
from (
select max(id) as max_id
from your_table
) t
connect by level <= (
select t.max_id + us.cache_size + 1 - us.last_number
from user_sequences us
where sequence_name = 'YOUR_SEQUENCE'
);
SQL Fiddle.
With low existing ID values the second one might do more work, but with higher values you can see the second comes into its own a bit.
I'm currently working on a project that needs to have a process that assigns "control numbers" to some records. This also needs to be able to be run at a later date and include records without a control number that changed, and assign an unused control number to these records. These control numbers are preassigned by an outside entity and are 9 digits long. You would usually get a range depending on how many estimated records your company will generate. For example one of the companies estimated they would need 50, so they assigned us the range 790123401 to 790123450.
The problem: right now I'm using cursors to assign these numbers. For each individual record, I go and check if the first number in the sequence is already taken in the table, if it is, I increment the number, and recheck. This check goes on and on for each record in the table. One of the companies has 17,000 records, which means that for each of the records, I could be potentially iterating at worst 17,000 times if all numbers have been taken.
I really don't mind all the repetition on the initial run since the first run will assign control numbers to a lot of records. My problem is that if later a record gets changed and now should have a control number associated with it, then re-running the process would mean it would go through each available number until I get an unused one.
I've seen numerous examples on how to use sequences without cursors, but most are specific to Oracle. I'm using SQL Server 2005 for this particular project.
Suggestions?
You are looking for all unassigned numbers in a range? If so you can outer join onto a numbers table. The example below uses a CTE to create one on the fly I would suggest a permanent one containing at least 17,000 numbers if that is the max size of your range.
DECLARE #StartRange int, #EndRange int
SET #StartRange = 790123401
SET #EndRange = 790123450;
WITH YourTable(ControlNumber) AS
(
SELECT 790123401 UNION ALL
SELECT 790123402 UNION ALL
SELECT 790123403 UNION ALL
SELECT 790123406
),
Nums(N) AS
(
SELECT #StartRange
UNION ALL
SELECT N+1
FROM Nums
WHERE N < #EndRange
)
SELECT N
FROM Nums
WHERE NOT EXISTS(SELECT *
FROM YourTable
WHERE ControlNumber = N )
OPTION (MAXRECURSION 0)
I have a table with unique values within it and once a stored procedure is called, I use the following code within a sub-query to get a random value from the table:
SELECT TOP 1 UniqueID FROM UniqueValues
WHERE InitiatingID is NULL
ORDER BY NewID() ASC
I have however noticed that I am managing now and then (and I'm guessing two calls running simultaneously cause it) to retrieve the same unique value twice, which causes some issues within the program.
Is there any way (preferably not locking the table) to make the unique values ID generation completely unique - or unique enough to not affect two simultaneous calls? As a note, I need to keep the unique values and cannot use GUIDs directly here.
Thanks,
Kyle
Edit for clarification:
I am buffering the unique values. That's what the WHERE InitiatingID is NULL is all about. As a value gets picked out of the query, the InitiatingID is set and therefore cannot be used again until released. The problem is that in the milliseconds of that process setting the InitiatingID it seems that the value is getting picked up again, thus harming the process.
Random implies that you will get the same value twice randomly.
Why not using IDENTITY columns?
I wrote a blog post about manual ID generation some days ago here. Maybe that helps.
What you're doing isn't really generating random unique values - which has a low probability of generating duplicates if you use the appropriate routines, but randomly selecting one item from a population - which, depending on the size of your population, will have a much higher chance of repeat occurrences. In fact, given enough repeated drawing, there will occasionally be repeats - if there weren't, it wouldn't be truly random.
If what you want is to never draw the same unique id in a row, you might consider buffering the 'old' unique id somewhere, and discarding your draw if it matches (or running a WHERE <> currentlydrawuniqueID).
What about using update with the output clause to select the UniqueId and set InitiatingId all at once. http://msdn.microsoft.com/en-US/library/ms177564(v=SQL.90).aspx
Something like: (Though I don't have SQL Server handy, so not tested.)
DECLARE #UniqueIDTable TABLE
(
UniqueId int
)
UPDATE UniqueValues
SET InitiatingID = #InitiatingID
OUTPUT INSERTED.UniqueId into #UniqueIDTable
WHERE UniqueID =
(SELECT TOP 1 UniqueID FROM UniqueValues
WHERE InitiatingID is NULL
ORDER BY NewID() ASC)
AND InitiatingID is NULL
I want to use a database table as a queue. I want to insert in it and take elements from it in the inserted order (FIFO). My main consideration is performance because I have thousands of these transactions each second. So I want to use a SQL query that gives me the first element without searching the whole table. I do not remove a row when I read it.
Does SELECT TOP 1 ..... help here?
Should I use any special indexes?
I'd use an IDENTITY field as the primary key to provide the uniquely incrementing ID for each queued item, and stick a clustered index on it. This would represent the order in which the items were queued.
To keep the items in the queue table while you process them, you'd need a "status" field to indicate the current status of a particular item (e.g. 0=waiting, 1=being processed, 2=processed). This is needed to prevent an item be processed twice.
When processing items in the queue, you'd need to find the next item in the table NOT currently being processed. This would need to be in such a way so as to prevent multiple processes picking up the same item to process at the same time as demonstrated below. Note the table hints UPDLOCK and READPAST which you should be aware of when implementing queues.
e.g. within a sproc, something like this:
DECLARE #NextID INTEGER
BEGIN TRANSACTION
-- Find the next queued item that is waiting to be processed
SELECT TOP 1 #NextID = ID
FROM MyQueueTable WITH (UPDLOCK, READPAST)
WHERE StateField = 0
ORDER BY ID ASC
-- if we've found one, mark it as being processed
IF #NextId IS NOT NULL
UPDATE MyQueueTable SET Status = 1 WHERE ID = #NextId
COMMIT TRANSACTION
-- If we've got an item from the queue, return to whatever is going to process it
IF #NextId IS NOT NULL
SELECT * FROM MyQueueTable WHERE ID = #NextID
If processing an item fails, do you want to be able to try it again later? If so, you'll need to either reset the status back to 0 or something. That will require more thought.
Alternatively, don't use a database table as a queue, but something like MSMQ - just thought I'd throw that in the mix!
If you do not remove your processed rows, then you are going to need some sort of flag that indicates that a row has already been processed.
Put an index on that flag, and on the column you are going to order by.
Partition your table over that flag, so the dequeued transactions are not clogging up your queries.
If you would really get 1.000 messages every second, that would result in 86.400.000 rows a day. You might want to think of some way to clean up old rows.
Everything depends on your database engine/implementation.
For me simple queues on tables with following columns:
id / task / priority / date_added
usually works.
I used priority and task to group tasks and in case of doubled task i choosed the one with bigger priority.
And don't worry - for modern databases "thousands" is nothing special.
This will not be any trouble at all as long as you use something to keep track of the datetime of the insert. See here for the mysql options. The question is whether you only ever need the absolute most recently submitted item or whether you need to iterate. If you need to iterate, then what you need to do is grab a chunk with an ORDER BY statement, loop through, and remember the last datetime so that you can use that when you grab your next chunk.
perhaps adding a LIMIT=1 to your select statement would help ... forcing the return after a single match...
Since you don't delete the records from the table, you need to have a composite index on (processed, id), where processed is the column that indicates if the current record had been processed.
The best thing would be creating a partitioned table for your records and make the PROCESSED field the partitioning key. This way, you can keep three or more local indexes.
However, if you always process the records in id order, and have only two states, updating the record would mean just taking the record from the first leaf of the index and appending it to the last leaf
The currently processed record would always have the least id of all unprocessed records and the greatest id of all processed records.
Create a clustered index over a date (or autoincrement) column. This will keep the rows in the table roughly in index order and allow fast index-based access when you ORDER BY the indexed column. Using TOP X (or LIMIT X, depending on your RDMBS) will then only retrieve the first x items from the index.
Performance warning: you should always review the execution plans of your queries (on real data) to verify that the optimizer doesn't do unexpected things. Also try to benchmark your queries (again on real data) to be able to make informed decisions.
I had the same general question of "how do I turn a table into a queue" and couldn't find the answer I wanted anywhere.
Here is what I came up with for Node/SQLite/better-sqlite3.
Basically just modify the inner WHERE and ORDER BY clauses for your use case.
module.exports.pickBatchInstructions = (db, batchSize) => {
const buf = crypto.randomBytes(8); // Create a unique batch identifier
const q_pickBatch = `
UPDATE
instructions
SET
status = '${status.INSTRUCTION_INPROGRESS}',
run_id = '${buf.toString("hex")}',
mdate = datetime(datetime(), 'localtime')
WHERE
id IN (SELECT id
FROM instructions
WHERE
status is not '${status.INSTRUCTION_COMPLETE}'
and run_id is null
ORDER BY
length(targetpath), id
LIMIT ${batchSize});
`;
db.run(q_pickBatch); // Change the status and set the run id
const q_getInstructions = `
SELECT
*
FROM
instructions
WHERE
run_id = '${buf.toString("hex")}'
`;
const rows = db.all(q_getInstructions); // Get all rows with this batch id
return rows;
};
A very easy solution for this in order not to have transactions, locks etc is to use the change tracking mechanisms (not data capture). It utilizes versioning for each added/updated/removed row so you can track what changes happened after a specific version.
So, you persist the last version and query the new changes.
If a query fails, you can always go back and query data from the last version.
Also, if you want to not get all changes with one query, you can get top n order by last version and store the greatest version I'd you have got to query again.
See this for example Using Change Tracking in SQL Server 2008