SQL Server rand() aggregate - sql

Problem: a table of coordinate lat/lngs. Two rows can potentially have the same coordinate. We want a query that returns a set of rows with unique coordinates (within the returned set). Note that distinct is not usable because I need to return the id column which is, by definition, distinct. This sort of works (#maxcount is the number of rows we need, intid is a unique int id column):
select top (#maxcount) max(intid)
from Documents d
group by d.geoLng, d.geoLat
It will always return the same row for a given coordinate unfortunately, which is bit of a shame for my use. If only we had a rand() aggregate we could use instead of max()... Note that you can't use max() with guids created by newid().
Any ideas?
(there's some more background here, if you're interested: http://www.itu.dk/~friism/blog/?p=121)
UPDATE: Full solution here

You might be able to use a CTE for this with the ROW_NUMBER function across lat and long and then use rand() against that. Something like:
WITH cte AS
(
SELECT
intID,
ROW_NUMBER() OVER
(
PARTITION BY geoLat, geoLng
ORDER BY NEWID()
) AS row_num,
COUNT(intID) OVER (PARTITION BY geoLat, geoLng) AS TotalCount
FROM
dbo.Documents
)
SELECT TOP (#maxcount)
intID, RAND(intID)
FROM
cte
WHERE
row_num = 1 + FLOOR(RAND() * TotalCount)
This will always return the first sets of lat and lngs and I haven't been able to make the order random. Maybe someone can continue on with this approach. It will give you a random row within the matching lat and lng combinations though.
If I have more time later I'll try to get around that last obstacle.

this doesn't work for you?
select top (#maxcount) *
from
(
select max(intid) as id from Documents d group by d.geoLng, d.geoLat
) t
order by newid()

Where did you get the idea that DISTINCT only works on one column? Anyway, you could also use a GROUP BY clause.

Related

SQL random sampling into equal groups

I need to randomly sample users in a table into 4 equal groups using SQL from a table. For that I did the below:
First, randomize all users in the table using RANDOM() function, then use the result of it with NTILE() function to divide them into 4 equal halves, like below:
WITH randomised_users AS (
SELECT *
FROM users_table
ORDER BY RANDOM()
) SELECT *,
ntile(4) OVER(ORDER BY (SELECT 1)) AS tile_nr
FROM randomised_users
Is this approach of sampling correct or is there a chance for bias in the 4 groups created from this?
What you have looks fine to me. You don't need a subquery BTW. This will do just fine
select *, ntile(4) over (order by random())
Snowflake doesn't guarantee the query will reproduce the same result set even if you provide a random seed so make sure to dump any intermediate result set into a temp table if you plan on re-using it.

Incorporate a concatenation and count in a SQL update command?

I am looking for a way to update records so each entry adds 1 to the end of the string. In my case, I'm trying to update a field named FiberID. Each Record should have JCK0.R000.Ax, where x is equal to 1,2,3...,24.
Ideal result:
FiberID
JCK0.R000.A1
JCK0.R000.A2
JCK0.R000.A3
... and so on until it reaches A24.
Here is an example of the data.
This seems so useful that I'm sure it has been discussed here before, but for what ever reason I'm not seeing anything.
You could use an row_number and an updatable CTE:
with cte as (
select
fiber_id,
concat(
fiber_id,
'.A',
cast(row_number() over (partition by fiber_id order by id) as varchar(2))
) new_fiber_id
from mytable
)
update cte set fiber_id = new_fiber_id
This assumes that you have a column called id that can be used to order records having the same fiber_id.
Side note: it is unclear why you should have exactly 24 numbers per fiber_id, and you sample data does not describes that. This will assign increasing numbers to duplicate fiber_ids, regardless of how many there are.

SQL Server - Pagination Without Order By Clause

My situation is that a SQL statement which is not predictable, is given to the program and I need to do pagination on top of it. The final SQL statement would be similar to the following one:
SELECT * FROM (*Given SQL Statement*) b
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
The problem here is that the *Given SQL Statement* is unpredictable. It may or may not contain order by clause. I am not able to change the query result of this SQL Statement and I need to do pagination on it.
I searched for solution on the Internet, but all of them suggested to use an arbitrary column, like primary key, in order by clause. But it will change the original order.
The short answer is that it can't be done, or at least can't be done properly.
The problem is that SQL Server (or any RDBMS) does not and can not guarantee the order of the records returned from a query without an order by clause.
This means that you can't use paging on such queries.
Further more, if you use an order by clause on a column that appears multiple times in your resultset, the order of the result set is still not guaranteed inside groups of values in said column - quick example:
;WITH cte (a, b)
AS
(
SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, 'b'
)
SELECT *
FROM cte
ORDER BY a
Both result sets are valid, and you can't know in advance what will you get:
a b
-----
1 b
1 a
2 b
2 a
a b
-----
1 a
1 b
2 a
2 b
(and of course, you might get other sorts)
The problem here is that the *Given SQL Statement" is unpredictable. It may or may not contain order by clause.
your inner query(unpredictable sql statement) should not contain order by,even if it contains,order is not guaranteed.
To get guaranteed order,you have to order by some column.for the results to be deterministic,the ordered column/columns should be unique
Please note: what I'm about to suggest is probably horribly inefficient and should really only be used to help you go back to the project leader and tell them that pagination of an unordered query should not be done. Having said that...
From your comments you say you are able to change the SQL statement before it is executed.
You could write the results of the original query to a temporary table, adding row count field to be used for subsequent pagination ordering.
Therefore any original ordering is preserved and you can now paginate.
But of course the reason for needing pagination in the first place is to avoid sending large amounts of data to the client application. Although this does prevent that, you will still be copying data to a temp table which, depending on the row size and count, could be very slow.
You also have the problem that the page size is coming from the client as part of the SQL statement. Parsing the statement to pick that out could be tricky.
As other notified using anyway without using a sorted query will not be safe, But as you know about it and search about it, I can suggest using a query like this (But not recommended as a good way)
;with cte as (
select *,
row_number() over (order by (select 0)) rn
from (
-- Your query
) t
)
select *
from cte
where rn between (#pageNumber-1)*#pageSize+1 and #pageNumber*#pageSize
[SQL Fiddle Demo]
I finally found a simple way to do it without any order by on a specific column:
declare #start AS INTEGER = 1, #count AS INTEGER = 5;
select * from (SELECT *,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS fakeCounter
FROM (select * from mytable) AS t) AS t2 order by fakeCounter OFFSET #start ROWS
FETCH NEXT #count ROWS ONLY
where select * from mytable can be any query

How to select bottom most rows?

I can do SELECT TOP (200) ... but why not BOTTOM (200)?
Well not to get into philosophy what I mean is, how can I do the equivalent of TOP (200) but in reverse (from the bottom, like you'd expect BOTTOM to do...)?
SELECT
columns
FROM
(
SELECT TOP 200
columns
FROM
My_Table
ORDER BY
a_column DESC
) SQ
ORDER BY
a_column ASC
It is unnecessary. You can use an ORDER BY and just change the sort to DESC to get the same effect.
Sorry, but I don't think I see any correct answers in my opinion.
The TOP x function shows the records in undefined order. From that definition follows that a BOTTOM function can not be defined.
Independent of any index or sort order. When you do an ORDER BY y DESC you get the rows with the highest y value first. If this is an autogenerated ID, it should show the records last added to the table, as suggested in the other answers. However:
This only works if there is an autogenerated id column
It has a significant performance impact if you compare that with the TOP function
The correct answer should be that there is not, and cannot be, an equivalent to TOP for getting the bottom rows.
Logically,
BOTTOM (x) is all the records except TOP (n - x), where n is the count; x <= n
E.g. Select Bottom 1000 from Employee:
In T-SQL,
DECLARE
#bottom int,
#count int
SET #bottom = 1000
SET #count = (select COUNT(*) from Employee)
select * from Employee emp where emp.EmployeeID not in
(
SELECT TOP (#count-#bottom) Employee.EmployeeID FROM Employee
)
It would seem that any of the answers which implement an ORDER BY clause in the solution is missing the point, or does not actually understand what TOP returns to you.
TOP returns an unordered query result set which limits the record set to the first N records returned. (From an Oracle perspective, it is akin to adding a where ROWNUM < (N+1).
Any solution which uses an order, may return rows which also are returned by the TOP clause (since that data set was unordered in the first place), depending on what criteria was used in the order by
The usefulness of TOP is that once the dataset reaches a certain size N, it stops fetching rows. You can get a feel for what the data looks like without having to fetch all of it.
To implement BOTTOM accurately, it would need to fetch the entire dataset unordered and then restrict the dataset to the final N records. That will not be particularly effective if you are dealing with huge tables. Nor will it necessarily give you what you think you are asking for. The end of the data set may not necessarily be "the last rows inserted" (and probably won't be for most DML intensive applications).
Similarly, the solutions which implement an ORDER BY are, unfortunately, potentially disastrous when dealing with large data sets. If I have, say, 10 Billion records and want the last 10, it is quite foolish to order 10 Billion records and select the last 10.
The problem here, is that BOTTOM does not have the meaning that we think of when comparing it to TOP.
When records are inserted, deleted, inserted, deleted over and over and over again, some gaps will appear in the storage and later, rows will be slotted in, if possible. But what we often see, when we select TOP, appears to be sorted data, because it may have been inserted early on in the table's existence. If the table does not experience many deletions, it may appear to be ordered. (e.g. creation dates may be as far back in time as the table creation itself). But the reality is, if this is a delete-heavy table, the TOP N rows may not look like that at all.
So -- the bottom line here(pun intended) is that someone who is asking for the BOTTOM N records doesn't actually know what they're asking for. Or, at least, what they're asking for and what BOTTOM actually means are not the same thing.
So -- the solution may meet the actual business need of the requestor...but does not meet the criteria for being the BOTTOM.
First, create an index in a subquery according to the table's original order using:
ROW_NUMBER () OVER (ORDER BY (SELECT NULL) ) AS RowIndex
Then order the table descending by the RowIndex column you've created in the main query:
ORDER BY RowIndex DESC
And finally use TOP with your wanted quantity of rows:
SELECT TOP 1 * --(or 2, or 5, or 34)
FROM (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL) ) AS RowIndex, *
FROM MyTable) AS SubQuery
ORDER BY RowIndex DESC
All you need to do is reverse your ORDER BY. Add or remove DESC to it.
The problem with ordering the other way is that it often does not make good use of indices. It is also not very extendable if you ever need to select a number of rows that are not at the start or the end. An alternative way is as follows.
DECLARE #NumberOfRows int;
SET #NumberOfRows = (SELECT COUNT(*) FROM TheTable);
SELECT col1, col2,...
FROM (
SELECT col1, col2,..., ROW_NUMBER() OVER (ORDER BY col1) AS intRow
FROM TheTable
) AS T
WHERE intRow > #NumberOfRows - 20;
The currently accepted answer by "Justin Ethier" is not a correct answer as pointed out by "Protector one".
As far as I can see, as of now, no other answer or comment provides the equivalent of BOTTOM(x) the question author asked for.
First, let's consider a scenario where this functionality would be needed:
SELECT * FROM Split('apple,orange,banana,apple,lime',',')
This returns a table of one column and five records:
apple
orange
banana
apple
lime
As you can see: we don't have an ID column; we can't order by the returned column; and we can't select the bottom two records using standard SQL like we can do for the top two records.
Here is my attempt to provide a solution:
SELECT * INTO #mytemptable FROM Split('apple,orange,banana,apple,lime',',')
ALTER TABLE #mytemptable ADD tempID INT IDENTITY
SELECT TOP 2 * FROM #mytemptable ORDER BY tempID DESC
DROP TABLE #mytemptable
And here is a more complete solution:
SELECT * INTO #mytemptable FROM Split('apple,orange,banana,apple,lime',',')
ALTER TABLE #mytemptable ADD tempID INT IDENTITY
DELETE FROM #mytemptable WHERE tempID <= ((SELECT COUNT(*) FROM #mytemptable) - 2)
ALTER TABLE #mytemptable DROP COLUMN tempID
SELECT * FROM #mytemptable
DROP TABLE #mytemptable
I am by no means claiming that this is a good idea to use in all circumstances, but it provides the desired results.
You can use the OFFSET FETCH clause.
SELECT COUNT(1) FROM COHORT; --Number of results to expect
SELECT * FROM COHORT
ORDER BY ID
OFFSET 900 ROWS --Assuming you expect 1000 rows
FETCH NEXT 100 ROWS ONLY;
(This is for Microsoft SQL Server)
Official documentation:
https://www.sqlservertutorial.net/sql-server-basics/sql-server-offset-fetch/
"Tom H" answer above is correct and it works for me in getting Bottom 5 rows.
SELECT [KeyCol1], [KeyCol2], [Col3]
FROM
(SELECT TOP 5 [KeyCol1],
[KeyCol2],
[Col3]
FROM [dbo].[table_name]
ORDER BY [KeyCol1],[KeyCol2] DESC) SOME_ALAIS
ORDER BY [KeyCol1],[KeyCol2] ASC
Thanks.
try this.
declare #floor int --this is the offset from the bottom, the number of results to exclude
declare #resultLimit int --the number of results actually retrieved for use
declare #total int --just adds them up, the total number of results fetched initially
--following is for gathering top 60 results total, then getting rid of top 50. We only keep the last 10
set #floor = 50
set #resultLimit = 10
set #total = #floor + #resultLimit
declare #tmp0 table(
--table body
)
declare #tmp1 table(
--table body
)
--this line will drop the wanted results from whatever table we're selecting from
insert into #tmp0
select Top #total --what to select (the where, from, etc)
--using floor, insert the part we don't want into the second tmp table
insert into #tmp1
select top #floor * from #tmp0
--using select except, exclude top x results from the query
select * from #tmp0
except
select * from #tmp1
I've come up with a solution to this that doesn't require you to know the number of row returned.
For example, if you want to get all the locations logged in a table, except the latest 1 (or 2, or 5, or 34)
SELECT *
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY CreatedDate) AS Row, *
FROM Locations
WHERE UserId = 12345) AS SubQuery
WHERE Row > 1 -- or 2, or 5, or 34
Querying a simple subquery sorted descending, followed by sorting on the same column ascending does the trick.
SELECT * FROM
(SELECT TOP 200 * FROM [table] t2 ORDER BY t2.[column] DESC) t1
ORDER BY t1.[column]
SELECT TOP 10*from TABLE1 ORDER BY ID DESC
Where ID is the primary key of the TABLE1.
SELECT columns FROM My_Table LIMIT 200 OFFSET (SELECT Count(*)-200 My_Table)

Can I get the position of a record in a SQL result table?

If I do something like
SELECT * FROM mytable ORDER BY mycolumn ASC;
I get a result table in a specific order.
Is there a way in SQL to efficiently find out, given a PK, what position in that result table would contain the record with my PK?
You can count the number of records where the value that you are sorting on has a lower value than the record that you know the key value of:
select count(*)
from mytable
where mycolumn < (select mycolumn from mytable where key = 42)
On databases that support it, you could use ROW_NUMBER() for this purpose:
SELECT RowNr
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY mycolumn) AS RowNr,
mycolumn
FROM mytable
) sub
WHERE sub.mycolumn = 42
The example assumes you're looking for primary key 42 :)
The subquery is necessary because something like:
SELECT
ROW_NUMBER() OVER (ORDER BY mycolumn) AS RowNr
FROM mytable
WHERE sub.mycolumn = 42
Will always return 1; ROW_NUMBER() works after the WHERE, so to speak.
SQL doesn't work that way. It's set-based, which means that "position in that result table" is meaningless to the database.
You can keep track of position when you map the ResultSet into a collection of objects or when you iterate over it.
Unfortunately you cannot get "the position of a row in a table".
The best you can get, using ORDER BY and a variant of the ROW_NUMBER construct (depends on the database engine in use), is the position of a row in the resultset of the query executed.
This position does not map back to any position in the table, though, unless the ORDER BY is on a set of clustered index columns, but even then that position might be invalidated the next second.
What I would like to know is what you intended to use this "position" for.
This answer applies to MySQL
==> lower than 8.0
SET #row_number = 0;
SELECT
(#row_number:=#row_number + 1) AS num,
myColumn.first,
myColumn.second
FROM
myTable
ORDER BY myColumn.first, myColumn.second
source: http://www.mysqltutorial.org/mysql-row_number/
==> greater than 8.0
Please see MySQL ROW_NUMBER() function manual as I did not test. But it seems this function is prefered.
There's no way you can tell that without selecting an entire subset of records. If your PK is of integer type, you can
select count(*) from mytable
where id <= 10 -- Record with ID 10
order by mycolumn asc