Selecting rows that don't exist physically in the database - sql

I've totally rewritten my question because the simplicity of the previous one people were taking too literally.
The aim:
INSERT INTO X
SELECT TOP 23452345 NEWID()
This query should insert 23452345 GUIDs to the "X" table. actually 23452345 means just any possible number that is entered by user and stored in database.
So the problem is that inserting rows to a database by using
INSERT INTO ... SELECT ...
statement requires you to already have the required amount of rows inserted to database.
Naturally you can emulate the existence of rows by using temporary data and cross joining it but this (in my stupid opinion) creates more results than needed and in some extreme situations might fail due to many unpredicted reasons. I need to be sure that if user entered extremely huge number like 2^32 or even bigger the system will work and behave normally without any possible side effects like extreme memory/time consumption etc...

In all fairness I derived the idea from this site.
;WITH cte AS
(
SELECT 1 x
UNION ALL
SELECT x + 1
FROM cte
WHERE x < 100
)
SELECT NEWID()
FROM cte
EDIT:
The general method we're seeing is to select from a table that has the desired number of rows. It's hackish, but you can create a table, insert the desired number of records, and select from it.
create table #num
(
num int
)
declare #i int
set #i = 1
while (#i <= 77777)
begin
insert into #num values (#i)
set #i = #i + 1
end
select NEWID() from #num
drop table #num

Of course creating a Number table is the best approach and will come in handy. You should definitely have one at your disposal. If you need something as a one-off just join to a known table. I usually use a system table such as spt_values:
declare #result table (id uniqueidentifier)
declare #sDate datetime
set #sDate = getdate();
;with num (n)
as ( select top(777777) row_number() over(order by t1.number) as N
from master..spt_values t1
cross join master..spt_values t2
)
insert into #result(id)
select newid()
from num;
select datediff(ms, #sDate, getdate()) [elasped]

I'd create an integers table and use it. This type of table comes in handy many situations.
CREATE TABLE dbo.Integers
(
i INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
)
WHILE COALESCE(SCOPE_IDENTITY(), 0) <= 100000 /* or some other large value */
BEGIN
INSERT dbo.Integers DEFAULT VALUES
END
Then all you need to do it:
SELECT NEWID()
FROM Integers
WHERE i <= 77777

Try this:
with
L0 as (select 1 as C union all select 1) --2 rows
,L1 as (select 1 as C from L0 as A, L0 as B) --4 rows
,L2 as (select 1 as C from L1 as A, L1 as B) --16 rows
,L3 as (select 1 as C from L2 as A, L2 as B) --256 rows
select top 100 newid() from L3

SELECT TOP 100 NEWID() from sys.all_columns
Or any other datasource that has a large number of records. You can build your own table for 'counting' functionality as such, you can use it in lieu of while loops.
Tally tables: http://www.sqlservercentral.com/articles/T-SQL/62867

Related

Generate a list with string prefix in SQL with fixed length

I just want to generate a list like this
XY0001
XY0002
XY0003
The prefix is same for all rows. Need fixed length (6 in this example)
Looking for an easy way to produce such list to put it into temp table.
MS SQL
for a very small number this would do:
DECLARE #TempList TABLE (Name VARCHAR(100));
insert into #TempList Values ('XY00001')
insert into #TempList Values ('XY00002')
insert into #TempList Values ('XY00003')
insert into #TempList Values ('XY00004')
select * from #TempList
You can use an ad-hoc tally table
If 2012+
DECLARE #TempList TABLE (Name VARCHAR(100));
Select Name = 'XY'+format(N,'0000')
From (Select Top 9999 N=Row_Number() Over (Order By (Select NULL)) From master..spt_values N1,master..spt_values N2) A
Order by N
Returns
Name
XY0001
XY0002
XY0003
XY0004
...
XY9997
XY9998
XY9999
If not
DECLARE #TempList TABLE (Name VARCHAR(100));
Select Name = 'XY'+right('00000'+cast(N as varchar(25)),4)
From (Select Top 9999 N=Row_Number() Over (Order By (Select NULL)) From master..spt_values N1,master..spt_values N2) A
Order by N
I like to use recursive CTE's for this.
declare #max_number int = 1000;
with num as (
select 1 as n
union
select n + 1
from num
where n < #max_number
)
select 'XY' + (cast n as char(4))
from num;
The recursive CTE gives you the numbers and the cast does the left-padding with 0's to ensure you get 0001 instead of 1.
This approach will support a variable number of outputs. Though as you alluded to in your question, this is overkill if you only want a few.
(You'll need to test this out for boundary cases. I haven't tested this exact code sample.)
There is likely a limit to how far this scales because it uses recursion.

Loop through sql result set and remove [n] duplicates

I've got a SQL Server db with quite a few dupes in it. Removing the dupes manually is just not going to be fun, so I was wondering if there is any sort of sql programming or scripting I can do to automate it.
Below is my query that returns the ID and the Code of the duplicates.
select a.ID, a.Code
from Table1 a
inner join (
SELECT Code
FROM Table1 GROUP BY Code HAVING COUNT(Code)>1)
x on x.Code= a.Code
I'll get a return like this, for example:
5163 51727
5164 51727
5165 51727
5166 51728
5167 51728
5168 51728
This snippet shows three returns for each ID/Code (so a primary "good" record and two dupes). However this isnt always the case. There can be up to [n] dupes, although 2-3 seems to be the norm.
I just want to somehow loop through this result set and delete everything but one record. THE RECORDS TO DELETE ARE ARBITRARY, as any of them can be "kept".
You can use row_number to drive your delete.
ie
CREATE TABLE #table1
(id INT,
code int
);
WITH cte AS
(select a.ID, a.Code, ROW_NUMBER() OVER(PARTITION by COdE ORDER BY ID) AS rn
from #Table1 a
)
DELETE x
FROM #table1 x
JOIN cte ON x.id = cte.id
WHERE cte.rn > 1
But...
If you are going to be doing a lot of deletes from a very large table you might be better off to select out the rows you need into a temp table & then truncate your table and re-insert the rows you need.
Keeps the Transaction log from getting hammered, your CI getting Fragged and should be quicker too!
It is actually very simple:
DELETE FROM Table1
WHERE ID NOT IN
(SELECT MAX(ID)
FROM Table1
GROUP BY CODE)
Self join solution with a performance test VS cte.
create table codes(
id int IDENTITY(1,1) NOT NULL,
code int null,
CONSTRAINT [PK_codes_id] PRIMARY KEY CLUSTERED
(
id ASC
))
declare #counter int, #code int
set #counter = 1
set #code = 1
while (#counter <= 1000000)
begin
print ABS(Checksum(NewID()) % 1000)
insert into codes(code) select ABS(Checksum(NewID()) % 1000)
set #counter = #counter + 1
end
GO
set statistics time on;
delete a
from codes a left join(
select MIN(id) as id from codes
group by code) b
on a.id = b.id
where b.id is null
set statistics time off;
--set statistics time on;
-- WITH cte AS
-- (select a.id, a.code, ROW_NUMBER() OVER(PARTITION by code ORDER BY id) AS rn
-- from codes a
-- )
-- delete x
-- FROM codes x
-- JOIN cte ON x.id = cte.id
-- WHERE cte.rn > 1
--set statistics time off;
Performance test results:
With Join:
SQL Server Execution Times:
CPU time = 3198 ms, elapsed time = 3200 ms.
(999000 row(s) affected)
With CTE:
SQL Server Execution Times:
CPU time = 4197 ms, elapsed time = 4229 ms.
(999000 row(s) affected)
It's basically done like this:
WITH CTE_Dup AS
(
SELECT*,
ROW_NUMBER()OVER (PARTITIONBY SalesOrderno, ItemNo ORDER BY SalesOrderno, ItemNo)
AS ROW_NO
from dbo.SalesOrderDetails
)
DELETEFROM CTE_Dup WHERE ROW_NO > 1;
NOTICE: MUST INCLUDE ALL FIELDS!!
Here is another example:
CREATE TABLE #Table (C1 INT,C2 VARCHAR(10))
INSERT INTO #Table VALUES (1,'SQL Server')
INSERT INTO #Table VALUES (1,'SQL Server')
INSERT INTO #Table VALUES (2,'Oracle')
SELECT * FROM #Table
;WITH Delete_Duplicate_Row_cte
AS (SELECT ROW_NUMBER()OVER(PARTITION BY C1, C2 ORDER BY C1,C2) ROW_NUM,*
FROM #Table )
DELETE FROM Delete_Duplicate_Row_cte WHERE ROW_NUM > 1
SELECT * FROM #Table

Populating an empty table with sequential numbers

I have a table which is already truncated (Microsoft SQL 2008). I have to now populate it with sequential numbers up to 50,000 records arbitrary numbers (doesn't mater) up to 7 characters.
Can any one help as to what SQL statement I need to write that will automatically populate the newly empty table with A000001,A0000002,A0000003, etc so that I can sort number the records within the table.
I have approximately 50000 records which I need to sequentially entered and I really don't want to number the column manually via hand editing.
Thanks in advance.
I'd use excel to generate your unique ids using the following:
In A column:
=CONCATENATE($C2, TEXT($B2,"000000"))
In B column put a 1 in the first row and the following code in all subsequent rows:
=SUM($B4 + 1)
In C column:
The letter A
Then just import the excel csv as a table and you'll have all your ids ready to insert into your empty table.
The SQL below loads a table variable up. Just select from it and insert the data into the new table. Certainly not the model of efficiency, but it'll get the job done.
DECLARE #tmp TABLE(
Value NVARCHAR(10)
)
DECLARE #Counter INT=0
DECLARE #Padding NVARCHAR(20)
WHILE #Counter<50000
BEGIN
SET #Counter=#Counter+1
SET #Padding=
CASE LEN(CONVERT(NVARCHAR,#Counter))
WHEN 1 THEN '00000'
WHEN 2 THEN '0000'
WHEN 3 THEN '000'
WHEN 4 THEN '00'
WHEN 5 THEN '0'
ELSE ''
END
INSERT INTO #tmp SELECT 'A' + #Padding + CONVERT(NVARCHAR,#Counter)
END
select * from #tmp
Use Stacked CTE to generate sequential Numbers
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e2 CROSS JOIN e2 AS b), -- 100*100
e4(n) AS (SELECT 1 FROM e3 CROSS JOIN (SELECT TOP 5 n FROM e1) AS b) -- 5*10000
SELECT n = 'A'+right('000000'+
convert(varchar(20),ROW_NUMBER() OVER (ORDER BY n)),7)
FROM e4 ORDER BY n;
Check here for more methods to generate sequential numbers with performance analysis
Use a table with an identity column and populate it. Then update that table to set the alpha value you need as follows:
create table MyTable (
ID int not null identity(1,1),
Alpha varchar(30)
)
truncate table MyTable
begin tran -- makes it run much faster
declare #i int
select #i = 1
while #i < 1000000
begin
insert into MyTable (Alpha) values ('')
select #i = #i + 1
end
commit
update MyTable set Alpha = 'A' + replicate('0', 6 - len(cast(ID as varchar(30)))) + cast(ID as varchar(30))

SQL: how to get random number of rows from one table for each row in another

I have two tables where the data is not related
For each row in table A i want e.g. 3 random rows in table B
This is fairly easy using a cursor, but it is awfully slow
So how can i express this in single statement to avoid RBAR ?
To get a random number between 0 and (N-1), you can use.
abs(checksum(newid())) % N
Which means to get positive values 1-N, you use
1 + abs(checksum(newid())) % N
Note: RAND() doesn't work - it is evaluated once per query batch and you get stuck with the same value for all rows of tableA.
The query:
SELECT *
FROM tableA A
JOIN (select *, rn=row_number() over (order by newid())
from tableB) B ON B.rn <= 1 + abs(checksum(newid())) % 9
(assuming you wanted up to 9 random rows of B per A)
assuming tableB has integer surrogate key, try
Declare #maxRecs integer = 11 -- Maximum number of b records per a record
Select a.*, b.*
From tableA a Join tableB b
On b.PKColumn % (floor(Rand() * #maxRecs)) = 0
If you have a fixed number that you know in advance (such as 3), then:
select a.*, b.*
from a cross join
(select top 3 * from b) b
If you want a random number of rows from "b" for each row in "a", the problem is a bit harder in SQL Server.
Heres an example of how this could be done, code is self contained, copy and press F5 ;)
-- create two tables we can join
DECLARE #datatable TABLE(ID INT)
DECLARE #randomtable TABLE(ID INT)
-- add some dummy data
DECLARE #i INT = 1
WHILE(#i < 3) BEGIN
INSERT INTO #datatable (ID) VALUES (#i)
SET #i = #i + 1
END
SET #i = 1
WHILE(#i < 100) BEGIN
INSERT INTO #randomtable (ID) VALUES (#i)
SET #i = #i + 1
END
--The key here being the ORDER BY newid() which makes sure that
--the TOP 3 is different every time
SELECT
d.ID AS DataID
,rtable.ID RandomRow
FROM #datatable d
LEFT JOIN (SELECT TOP 3 * FROM #randomtable ORDER BY newid()) as rtable ON 1 = 1
Heres an example of the output

How can I extend this SQL query to find the k nearest neighbors?

I have a database full of two-dimensional data - points on a map. Each record has a field of the geometry type. What I need to be able to do is pass a point to a stored procedure which returns the k nearest points (k would also be passed to the sproc, but that's easy). I've found a query at http://blogs.msdn.com/isaac/archive/2008/10/23/nearest-neighbors.aspx which gets the single nearest neighbour, but I can't figure how to extend it to find the k nearest neighbours.
This is the current query - T is the table, g is the geometry field, #x is the point to search around, Numbers is a table with integers 1 to n:
DECLARE #start FLOAT = 1000;
WITH NearestPoints AS
(
SELECT TOP(1) WITH TIES *, T.g.STDistance(#x) AS dist
FROM Numbers JOIN T WITH(INDEX(spatial_index))
ON T.g.STDistance(#x) < #start*POWER(2,Numbers.n)
ORDER BY n
)
SELECT TOP(1) * FROM NearestPoints
ORDER BY n, dist
The inner query selects the nearest non-empty region and the outer query then selects the top result from that region; the outer query can easily be changed to (e.g.) SELECT TOP(20), but if the nearest region only contains one result, you're stuck with that.
I figure I probably need to recursively search for the first region containing k records, but without using a table variable (which would cause maintenance problems as you have to create the table structure and it's liable to change - there're lots of fields), I can't see how.
What happens if you remove TOP (1) WITH TIES from the inner query, and set the outer query to return the top k rows?
I'd also be interested to know whether this amendment helps at all. It ought to be more efficient than using TOP:
DECLARE #start FLOAT = 1000
,#k INT = 20
,#p FLOAT = 2;
WITH NearestPoints AS
(
SELECT *
,T.g.STDistance(#x) AS dist
,ROW_NUMBER() OVER (ORDER BY T.g.STDistance(#x)) AS rn
FROM Numbers
JOIN T WITH(INDEX(spatial_index))
ON T.g.STDistance(#x) < #start*POWER(#p,Numbers.n)
AND (Numbers.n - 1 = 0
OR T.g.STDistance(#x) >= #start*POWER(#p,Numbers.n - 1)
)
)
SELECT *
FROM NearestPoints
WHERE rn <= #k;
NB - untested - I don't have access to SQL 2008 here.
Quoted from Inside Microsoft® SQL Server® 2008: T-SQL Programming. Section 14.8.4.
The following query will return the 10
points of interest nearest to #input:
DECLARE #input GEOGRAPHY = 'POINT (-147 61)';
DECLARE #start FLOAT = 1000;
WITH NearestNeighbor AS(
SELECT TOP 10 WITH TIES
*, b.GEOG.STDistance(#input) AS dist
FROM Nums n JOIN GeoNames b WITH(INDEX(geog_hhhh_16_sidx)) -- index hint
ON b.GEOG.STDistance(#input) < #start*POWER(CAST(2 AS FLOAT),n.n)
AND b.GEOG.STDistance(#input) >=
CASE WHEN n = 1 THEN 0 ELSE #start*POWER(CAST(2 AS FLOAT),n.n-1) END
WHERE n <= 20
ORDER BY n
)
SELECT TOP 10 geonameid, name, feature_code, admin1_code, dist
FROM NearestNeighbor
ORDER BY n, dist;
Note: Only part of this query’s WHERE
clause is supported by the spatial
index. However, the query optimizer
correctly evaluates the supported part
(the "<" comparison) using the index.
This restricts the number of rows for
which the ">=" part must be tested,
and the query performs well. Changing
the value of #start can sometimes
speed up the query if it is slower
than desired.
Listing 2-1. Creating and Populating Auxiliary Table of Numbers
SET NOCOUNT ON;
USE InsideTSQL2008;
IF OBJECT_ID('dbo.Nums', 'U') IS NOT NULL DROP TABLE dbo.Nums;
CREATE TABLE dbo.Nums(n INT NOT NULL PRIMARY KEY);
DECLARE #max AS INT, #rc AS INT;
SET #max = 1000000;
SET #rc = 1;
INSERT INTO Nums VALUES(1);
WHILE #rc * 2 <= #max
BEGIN
INSERT INTO dbo.Nums SELECT n + #rc FROM dbo.Nums;
SET #rc = #rc * 2;
END
INSERT INTO dbo.Nums
SELECT n + #rc FROM dbo.Nums WHERE n + #rc <= #max;