Select records from a specific key onwards

Select records from a specific key onwards - sql

I have a table that has more than three trillion records
The main key of this table is guid
As below
GUID Value mid id
0B821574-8E85-4FB7-8047-553393E385CB 4 51 15
716F74B0-80D8-4869-86B4-99FF9EB10561 0 510 153
7EBA2C31-FFC8-4071-B11A-9E2B7ED16B2B 2 5 3
85491F90-E4C6-4030-B1E5-B9CA36238AE2 1 58 7
F04FA30C-0C35-4B9F-A01C-708C0189815D 20 50 13
guid is primary key
I want to select 10 records from where the key is equal to, for example, 85491F90-E4C6-4030-B1E5-B9CA36238AE2

You can use order by and top. Assuming that guid defines the ordering of the rows:
select top (10) t.*
from mytable t
where guid >= '85491F90-E4C6-4030-B1E5-B9CA36238AE2'
order by guid
If the ordering is defined in an other column, say id (that should be unique as well), then you would use a correlated subquery for filterig:
select top (10) t.*
from mytable t
where id >= (select id from mytable t1 where guid = '85491F90-E4C6-4030-B1E5-B9CA36238AE2')
order by id

To read data onward You can use OFFSET .. FETCH in the ORDER BY since MS SQL Server 2012. According learn.microsoft.com something like this:
-- Declare and set the variables for the OFFSET and FETCH values.
DECLARE #StartingRowNumber INT = 1
, #RowCountPerPage INT = 10;
-- Create the condition to stop the transaction after all rows have been returned:
WHILE (SELECT COUNT(*) FROM mytable) >= #StartingRowNumber
BEGIN
-- Run the query until the stop condition is met:
SELECT *
FROM mytable WHERE guid = '85491F90-E4C6-4030-B1E5-B9CA36238AE2'
ORDER BY id
OFFSET #StartingRowNumber - 1 ROWS
FETCH NEXT #RowCountPerPage ROWS ONLY;
-- Increment #StartingRowNumber value:
SET #StartingRowNumber = #StartingRowNumber + #RowCountPerPage;
CONTINUE
END;
In the real world it will not be enough, because another processes could (try) read or write data in your table at the same time.
Please, read documentation, for example, search for "Running multiple queries in a single transaction" in the https://learn.microsoft.com/en-us/sql/t-sql/queries/select-order-by-clause-transact-sql
Proper indexes for fields id and guid must to be created/applied to provide performance

Related

SQL Query Create isDuplicate Column with IDs

I have a SQL Server 2005 database I'm working with. For the query I am using, I want to add a custom column that can start at any number and increment based on the row entry number.
For example, I start at number 10. Each row in my results will have an incrementing number 10, 11, 12, etc..
This is an example of the SELECT statement I would be using.
int customVal = 10;
SELECT
ID, customVal++
FROM myTable
The format of the above is clearly wrong, but it is conceptually what I am looking for.
RESULTS:
ID CustomColumn
-------------------
1 10
2 11
3 12
4 13
How can I go about implementing this kind functionality?
I cannot find any reference to incrementing variables within results. Is this the case?
EDIT: The customVal number will be pulled from another table. I.e. probably do a Select statement into the customVal variable. You cannot assume the the ID column will be any usable values.
The CustomColumn will be auto-incrementing starting at the customVal.

Use the ROW_NUMBER ranking function - http://technet.microsoft.com/en-us/library/ms186734.aspx
DECLARE #Offset INT = 9
SELECT
ID
, ROW_NUMBER() OVER (ORDER BY ID) + #Offset
FROM
Table

Get all missing values between two limits in SQL table column

I am trying to assign ID numbers to records that are being inserted into an SQL Server 2005 database table. Since these records can be deleted, I would like these records to be assigned the first available ID in the table. For example, if I have the table below, I would like the next record to be entered at ID 4 as it is the first available.
| ID | Data |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 5 | ... |
The way that I would prefer this to be done is to build up a list of available ID's via an SQL query. From there, I can do all the checks within the code of my application.
So, in summary, I would like an SQL query that retrieves all available ID's between 1 and 99999 from a specific table column.

First build a table of all N IDs.
declare #allPossibleIds table (id integer)
declare #currentId integer
select #currentId = 1
while #currentId < 1000000
begin
insert into #allPossibleIds
select #currentId
select #currentId = #currentId+1
end
Then, left join that table to your real table. You can select MIN if you want, or you could limit your allPossibleIDs to be less than the max table id
select a.id
from #allPossibleIds a
left outer join YourTable t
on a.id = t.Id
where t.id is null

Don't go for identity,
Let me give you an easy option while i work on a proper one.
Store int from 1-999999 in a table say Insert_sequence.
try to write an Sp for insertion,
You can easly identify the min value that is present in your Insert_sequence and not in
your main table, store this value in a variable and insert the row with ID from variable..
Regards
Ashutosh Arya

You could also loop through the keys. And when you hit an empty one Select it and exit Loop.
DECLARE #intStart INT, #loop bit
SET #intStart = 1
SET #loop = 1
WHILE (#loop = 1)
BEGIN
IF NOT EXISTS(SELECT [Key] FROM [Table] Where [Key] = #intStart)
BEGIN
SELECT #intStart as 'FreeKey'
SET #loop = 0
END
SET #intStart = #intStart + 1
END
GO
From there you can use the key as you please. Setting a #intStop to limit the loop field would be no problem.

Why do you need a table from 1..999999 all information you need is in your source table. Here is a query which give you minimal ID to insert in gaps.
It works for all combinations:
(2,3,4,5) - > 1
(1,2,3,5) - > 4
(1,2,3,4) - > 5
SQLFiddle demo
select min(t1.id)+1 from
(
select id from t
union
select 0
)
t1
left join t as t2 on t1.id=t2.id-1
where t2.id is null

Many people use an auto-incrementing integer or long value for the Primary Key of their tables, and it is often called ID or MyEntityID or something similar. This column, since it's just an auto-incrementing integer, often has nothing to do with the data being stored itself.
These types of "primary keys" are called surrogate keys. They have no meaning. Many people like these types of IDs to be sequential because it is "aesthetically pleasing", but this is a waste of time and resources. The database could care less about which IDs are being used and which are not.
I would highly suggest you forget trying to do this and just leave the ID column auto-increment. You should also create an index on your table that is made up of those (subset of) columns that can uniquely identify each record in the table (and even consider using this index as your primary key index). In rare cases where you would need to use all columns to accomplish that, that is where an auto-incrementing primary key ID is extremely useful—because it may not be performant to create an index over all columns in the table. Even so, the database engine could care less about this ID (e.g. which ones are in use, are not in use, etc.).
Also consider that an integer-based ID has a maximum total of 4.2 BILLION IDs. It is quite unlikely that you'll exhaust the supply of integer-based IDs in any short amount of time, which further bolsters the argument for why this sort of thing is a waste of time and resources.

SQL Server custom record sort in table, allowing to delete records

SQL Server table with custom sort has columns: ID (PK, auto-increment), OrderNumber, Col1, Col2..
By default an insert trigger copies value from ID to OrderNumber as suggested here.
Using some visual interface, user can sort records by incrementing or decrementing OrderNumber values.
However, how to deal with records being deleted in the meantime?
Example:
Say you add records with PK ID: 1,2,3,4,5 - OrderNumber receives same values. Then you delete records with ID=4,ID=5. Next record will have ID=6 and OrderNumber will receive the same value. Having a span of 2 missing OrderNumbers would force user to decrement record with ID=6 like 3 times to change it's order (i.e. 3x button pressed).
Alternatively, one could insert select count(*) from table into OrderNumber, but it would allow to have several similar values in table, when some old rows are deleted.
If one doesn't delete records, but only "deactivate" them, they're still included in sort order, just invisible for user. At the moment, solution in Java is needed, but I think the issue is language-independent.
Is there a better approach at this?

I would simply modify the script that switches the OrderNumber values so it does it correctly without relying on their being without gaps.
I don't know what arguments your script accepts and how it uses them, but the one that I've eventually come up with accept the ID of the item to move and the number of positions to move by (a negative value would mean "toward the lower OrderNumber values", and a positive one would imply the opposite direction).
The idea is as follows:
Look up the specified item's OrderNumber.
Rank all the items starting from OrderNumber in the direction determined by the second argument. The specified item thus receives the ranking of 1.
Pick the items with rankings from 1 to the one that is the absolute value of the second argument plus one. (I.e. the last item is the one where the specified item is being moved to.)
Join the resulting set with itself so that every row is joined with the next one and the last row is joined with the first one and thus use one set of rows to update the other.
This is the query that implements the above, with comments explaining some tricky parts:
Edited: fixed an issue with incorrect reordering
/* these are the arguments of the query */
DECLARE #ID int, #JumpBy int;
SET #ID = ...
SET #JumpBy = ...
DECLARE #OrderNumber int;
/* Step #1: Get OrderNumber of the specified item */
SELECT #OrderNumber = OrderNumber FROM atable WHERE ID = #ID;
WITH ranked AS (
/* Step #2: rank rows including the specified item and those that are sorted
either before or after it (depending on the value of #JumpBy */
SELECT
*,
rnk = ROW_NUMBER() OVER (
ORDER BY OrderNumber * SIGN(#JumpBy)
/* this little "* SIGN(#JumpBy)" trick ensures that the
top-ranked item will always be the one specified by #ID:
* if we are selecting rows where OrderNumber >= #OrderNumber,
the order will be by OrderNumber and #OrderNumber will be
the smallest item (thus #1);
* if we are selecting rows where OrderNumber <= #OrderNumber,
the order becomes by -OrderNumber and #OrderNumber again
becomes the top ranked item, because its negative counterpart,
-#OrderNumber, will again be the smallest one
*/
)
FROM atable
WHERE OrderNumber >= #OrderNumber AND #JumpBy > 0
OR OrderNumber <= #OrderNumber AND #JumpBy < 0
),
affected AS (
/* Step #3: select only rows that need be affected */
SELECT *
FROM ranked
WHERE rnk BETWEEN 1 AND ABS(#JumpBy) + 1
)
/* Step #4: self-join and update */
UPDATE old
SET OrderNumber = new.OrderNumber
FROM affected old
INNER JOIN affected new ON old.rnk = new.rnk % (ABS(#JumpBy) + 1) + 1
/* if old.rnk = 1, the corresponding new.rnk is N,
because 1 = N MOD N + 1 (N is ABS(#JumpBy)+1),
for old.rnk = 2 the matching new.rnk is 1: 2 = 1 MOD N + 1,
for 3, it's 2 etc.
this condition could alternatively be written like this:
new.rnk = (old.rnk + ABS(#JumpBy) - 1) % (ABS(#JumpBy) + 1) + 1
*/
Note: this assumes SQL Server 2005 or later version.
One known issue with this solution is that it will not "move" rows correctly if the specified ID cannot be moved exactly by the specified number of positions (for instance, if you want to move the topmost row up by any number of positions, or the second row by two or more positions etc.).

Ok - if I'm not mistaken, you want to defragment your OrderNumber.
What if you use ROW_NUMBER() for this ?
Example:
;WITH calc_cte AS (
SELECT
ID
, OrderNumber
, RowNo = ROW_NUMBER() OVER (ORDER BY ID)
FROM
dbo.Order
)
UPDATE
c
SET
OrderNumber = c.RowNo
FROM
calc_cte c
WHERE EXISTS (SELECT * FROM inserted i WHERE c.ID = i.ID)

Didn't want to reply my own question, but I believe I have found a solution.
Insert query:
INSERT INTO table (OrderNumber, col1, col2)
VALUES ((select count(*)+1 from table),val1,val2)
Delete trigger:
CREATE TRIGGER Cleanup_After_Delete ON table
AFTER DELETE AS
BEGIN
WITH rowtable AS (SELECT [ID], OrderNumber, rownum = ROW_NUMBER()
OVER (ORDER BY OrderNumber ASC) FROM table)
UPDATE rt SET OrderNumber = rt.rownum FROM rowtable rt
WHERE OrderNumber >= (SELECT OrderNumber FROM deleted)
END
The trigger fires up after every delete and corrects all OrderNumbers above the deleted one (no gaps). This means that I can simply change the order of 2 records by switching their OrderNumbers.
This is a working solution for my problem, however this one is also very good one, perhaps more useful for others.

Selecting most recent and specific version in each group of records, for multiple groups

The problem:
I have a table that records data rows in foo. Each time the row is updated, a new row is inserted along with a revision number. The table looks like:
id rev field
1 1 test1
2 1 fsdfs
3 1 jfds
1 2 test2
Note: the last record is a newer version of the first row.
Is there an efficient way to query for the latest version of a record and for a specific version of a record?
For instance, a query for rev=2 would return the 2, 3 and 4th row (not the replaced 1st row though) while a query for rev=1 yields those rows with rev <= 1 and in case of duplicated ids, the one with the higher revision number is chosen (record: 1, 2, 3).
I would not prefer to return the result in an iterative way.

To get only latest revisions:
SELECT * from t t1
WHERE t1.rev =
(SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)
To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):
SELECT * from foo t1
WHERE t1.rev =
(SELECT max(rev)
FROM foo t2
WHERE t2.id = t1.id
AND t2.rev <= 1)
It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.

Here's an alternative solution that incurs an update cost but is much more efficient for reading the latest data rows as it avoids computing MAX(rev). It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.
Aging
Replace the rev column with an age column
Create a view of the current latest data with filter: age = 0
To create a new version of your data ...
INSERT: new rows with age = -1 - This was my slow long running batch process.
UPDATE: UPDATE table-name SET age = age + 1 for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction.
DELETE: rows having age > N in the subset - Optionally purge old data
Indexing
Create a composite index with age and then id so the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (during UPDATE SET age=age+1) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a given id ordered by age, you may need an additional non-unique index on id then age.
Rollback
Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:
UPDATE table-name SET age = age - 1 -- Roll back a version
DELETE table-name WHERE age < 0 -- Clean up bad stuff
Existing Table
Suppose you have an existing table that now needs to support aging. You can use this pattern by first renaming the existing table, then add the age column and indexing and then create the view that includes the age = 0 condition with the same name as the original table name.
This strategy may or may not work depending on the nature of technology layers that depended on the original table but in many cases swapping a view for a table should drop in just fine.
Notes
I recommend naming the age column to RowAge in order to indicate this pattern is being used, since it's clearer that its a database related value and it complements SQL Server's RowVersion naming convention. It also won't conflict with a column or view that needs to return a person's age.
Unlike other solutions, this pattern works for non SQL Server databases.
If the subsets you're updating are very large then this might not be a good solution as your final transaction will update not just the current records but all past version of the records in this subset (which could even be the entire table!) so you may end up locking the table.

This is how I would do it. ROW_NUMBER() requires SQL Server 2005 or later
Sample data:
DECLARE #foo TABLE (
id int,
rev int,
field nvarchar(10)
)
INSERT #foo VALUES
( 1, 1, 'test1' ),
( 2, 1, 'fdsfs' ),
( 3, 1, 'jfds' ),
( 1, 2, 'test2' )
The query:
DECLARE #desiredRev int
SET #desiredRev = 2
SELECT * FROM (
SELECT
id,
rev,
field,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM #foo WHERE rev <= #desiredRev
) numbered
WHERE rn = 1
The inner SELECT returns all relevant records, and within each id group (that's the PARTITION BY), computes the row number when ordered by descending rev.
The outer SELECT just selects the first member (so, the one with highest rev) from each id group.
Output when #desiredRev = 2 :
id rev field rn
----------- ----------- ---------- --------------------
1 2 test2 1
2 1 fdsfs 1
3 1 jfds 1
Output when #desiredRev = 1 :
id rev field rn
----------- ----------- ---------- --------------------
1 1 test1 1
2 1 fdsfs 1
3 1 jfds 1

If you want all the latest revisions of each field, you can use
SELECT C.rev, C.fields FROM (
SELECT MAX(A.rev) AS rev, A.id
FROM yourtable A
GROUP BY A.id)
AS B
INNER JOIN yourtable C
ON B.id = C.id AND B.rev = C.rev
In the case of your example, that would return
rev field
1 fsdfs
1 jfds
2 test2

SELECT
MaxRevs.id,
revision.field
FROM
(SELECT
id,
MAX(rev) AS MaxRev
FROM revision
GROUP BY id
) MaxRevs
INNER JOIN revision
ON MaxRevs.id = revision.id AND MaxRevs.MaxRev = revision.rev

SELECT foo.* from foo
left join foo as later
on foo.id=later.id and later.rev>foo.rev
where later.id is null;

How about this?
select id, max(rev), field from foo group by id
For querying specific revision e.g. revision 1,
select id, max(rev), field from foo where rev <= 1 group by id

Select statement, table sample, equal distribution

Let's assume there is a SQL Server 2008 table like below, that holds 10 million rows.
One of the fields is Id, since it's identity it is from 1 to 10 million.
CREATE TABLE dbo.Stats
(
id INT IDENTITY(1,1) PRIMARY KEY,
field1 INT,
field2 INT,
...
)
Is there an efficient way by doing one select statement to get a subset of this data that satisfies the following requirements:
contains a limited number of rows in the result set, i.e. 100, 200, etc.
provides equal distribution of a certain column, not random, i.e. of column id
So, in our example, if we return 100 rows, the result set would look like this:
Row 1 - 100 000
Row 2 - 200 000
Row 3 - 300 000
...
Row 100 - 10 000 000
I want to avoid using cursor and storing this in a separate table.

Not sure how efficient it's going to be, but thie following query will return every 100000th row (relative to ordering established by id):
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY id) RN
FROM Stats
) T
WHERE RN % 100000 = 0
ORDER BY id
Since it does not rely on actual id values, this will work even if you have "holes" in the sequence of id values.

Something like this?
SELECT id FROM dbo..Stats WHERE id % 100000 = 0
it should work, since you are saying that id goes from 1 to 10 000 000. If number of rows is not known, but number of resulting rows is what you know, then just calculate that 100000 number like (if you would like 100 resulting rows):
SELECT id FROM Stats WHERE (id % (SELECT COUNT(id) FROM Stats) / 100) = 0

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas