Generate Random Test Data with ORDER BY NEWID() , include duplicate rows - sql

I need to select random rows from a table for test data. There may be times I need more rows of test data than there are records in the table. Duplicates are okay. How do I structure my select so that I can get duplicate rows?
CREATE TABLE [Northwind].[dbo].[Persons]
(PersonID int, LastName varchar(255))
INSERT INTO [Northwind].[dbo].[Persons]
VALUES
(1, 'Smith'),
(2, 'Jones'),
(3, 'Washington')
SELECT TOP 5 *
FROM [Northwind].[dbo].[Persons]
ORDER BY NEWID()
How do I get the Select statement to give me five records in random order, with repeats? Currently, it only returns three in random order.
I'd like to be able to extend this to get 100 rows or 1000 rows or however many I need.

Use a recursive CTE to union enough rows so that they are larger than what you desire. Then select from that as you have done before.
declare
#desired int = 5,
#actual int = (select count(*) from persons);
with
persons as (
select personId,
lastName,
batch = 0
from Persons
union all
select personId,
lastName,
batch = batch + 1
from persons
where (batch + 1) * #actual < #desired
)
select
top (#desired) personId, lastName
from persons
order by newid()

As mentioned. You could instead us a tally table and then get the random rows;
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3, N N4) --Repeat for more
SELECT TOP 500 YT.*
FROM Tally T
CROSS JOIN YourTable YT
ORDER BY NEWID();

I was thinking of how you would solve this without ordering all the records, especially multiple times.
One method is to generate random numbers and to use those for looking up values in your data:
with n as (
select rand(checksum(newid())) as r, 1 as n
union all
select rand(checksum(newid())) as r, n + 1
from n
where n < 10
),
tt as (
select t.*, lag(tile_end, 1, 0) over (order by tile_end) as tile_start
from (select t.*, row_number() over (order by newid()) * 1.0 / count(*) over () as tile_end
from t
) t
)
select tt.*, n.r, (select count(*) from n)
from n left join
tt
on n.r >= tt.tile_start and n.r < tt.tile_end;
Here is a db<>fiddle. The row_number() does not need to use order by newid(). It can order by a key that has an index -- which makes that component much more efficient.
For more than 100 rows, you will need OPTION (MAXRECURSION 0).

I added a temp results table and looped through the query and pushed the results into the temp table.
declare #results table(
SSN varchar(10),
Cusip varchar(10),
...
EndBillingDate varchar(10))
DECLARE #cnt INT = 0;
WHILE #cnt < #trades
BEGIN
INSERT INTO #results
Select ...
set #cnt = #cnt + 10
END
select * from #results

Related

SQL query runs forever when using a column instead of a string constant in where clause

I want to generate date ranges from whatever the max date in CustomersHub is to some other date say, '2022-11-16'.
Using a scalar column in where clause causes the query to seemingly run forever. I checked my stop condition to make sure query doesn't run infinitely and to test I put in a constant value that was basically the same as what my table column had. And the query started to work fine.
Here are the two tables:
create table #Temp
(
dt DateTime,
);
create table CustomersHub
(
id int,
firstloadedDate DateTime,
);
This is how I'm inserting data into the temp table:
insert into #Temp
select top 1 hub.firstloadedDate max_date from CustomersHub hub order by max_date desc;
And finally the query to generate date ranges:
WITH e00(n)
AS (SELECT 1
UNION ALL
SELECT 1),
e02(n)
AS (SELECT 1
FROM [e00] [a],
[e00] [b]),
e04(n)
AS (SELECT 1
FROM [e02] [a],
[e02] [b]),
e08(n)
AS (SELECT 1
FROM [e04] [a],
[e04] [b]),
e16(n)
AS (SELECT 1
FROM [e08] [a],
[e08] [b]),
e32(n)
AS (SELECT 1
FROM [e16] [a],
[e16] [b]),
num_tally(n)
AS (SELECT Row_number()
OVER (
ORDER BY ( SELECT NULL) )
FROM [e32]),
tally
AS (SELECT Dateadd(day, n - 1, dt) dates,
n,
dt
FROM [num_tally],
#temp
WHERE Datediff(day, dt, '2022-11-16') >= n)
SELECT *
FROM tally
DROP TABLE #temp
In the tally cte, if I replace dt with some date like '2022-10-20' then it works otherwise it just keeps running & running.
I would, instead, generate a Tally with enough numbers, and then just use DATEADD:
DECLARE #FromDate date,
#ToDate date;
SELECT #FromDate = MAX(hub.firstloadedDate),
#ToDate = '20221116'
FROM dbo.CustomersHub;
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS (
SELECT 0 AS I
UNION ALL
SELECT TOP (DATEDIFF(DAY,#FromDate,#ToDate))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3, N N4) --Up to 10,000 rows
SELECT DATEADD(DAY, T.I, #FromDate) AS Dt
FROM Tally T;
db<>fiddle

How To Create Duplicate Records depending on Column which indicates on Repetition

I've got a table which consisting aggregated records, and i need to Split them according to specific column ('Shares Bought' like in the example below), as Follow:
Original Table:
Requested Table:
Needless to say, that there are more records like that in the table and i need an automated query (not manual insertions),
and also there are some more attributes which i will need to duplicate (like the field 'Date').
You would need to first generate_rows with increasing row_number and then perform a cross join with your table.
Eg:
create table t(rowid int, name varchar(100),shares_bought int, date_val date)
insert into t
select *
from (values (1,'Dan',2,'2018-08-23')
,(2,'Mirko',1,'2018-08-25')
,(3,'Shuli',3,'2018-05-14')
,(4,'Regina',1,'2018-01-19')
)t(x,y,z,a)
with generate_data
as (select top (select max(shares_bought) from t)
row_number() over(order by (select null)) as rnk /* This would generate rows starting from 1,2,3 etc*/
from sys.objects a
cross join sys.objects b
)
select row_number() over(order by t.rowid) as rowid,t.name,1 as shares_bought,t.date_val
from t
join generate_data gd
on gd.rnk <=t.shares_bought /* generate rows up and until the number of shares bought*/
order by 1
Here is a db fiddle link
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=5736255585c3ab2c2964c655bec9e08b
declare #t table (rowid int, name varchar(100), sb int, dt date);
insert into #t values
(1, 'Dan', 2, '20180823'),
(2, 'Mirco', 1, '20180825'),
(3, 'Shuli', 3, '20180514'),
(4, 'Regina', 1, '20180119');
with nums as
(
select n
from (values(1), (2), (3), (4)) v(n)
)
select t.*
from #t t
cross apply (select top (t.sb) *
from nums) a;
Use a table of numbers instead of CTE nums or add there as many values as you can find in Shares Bought column.
Other option is to use recursive cte :
with t as (
select 1 as RowId, Name, ShareBought, Date
from table
union all
select RowId+1, Name, ShareBought, Date
from t
where RowId <= ShareBought
)
select row_number() over (order by name) as RowId,
Name, 1 as ShareBought, Date
from t;
If the sharebought not limited to only 2 or 3 then you would have to use option (maxrecursion 0) query hint as because by default it is limited to only 100 sharebought.

SQL to get sequence of phone numbers

I have table called PhoneNumbers with columns Phone and Range as below
here in the phone column i have a phone numbers and in range column i have a range of values i need the phone numbers to be included.For the first phone number 9125678463 I need to include the phone numbers till the range 9125678465 ie (9125678463,9125678464,9125678465).Similarly for other phone numbers too.here is the sample destination table should look like
How can i write the sql to get this?
Thanks in advance
I have a solution which goes a classic way BUT: it does not need recursions and it does not need any loops! And it works even if your range has length of 3 or 5, or whatever...
first i create a table with numbers (from 1 to 1 million in this example - you can adopt this in TOP () clause):
SELECT TOP (1000000) n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
INTO dbo.Numbers
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);
CREATE UNIQUE CLUSTERED INDEX idx_numbers ON dbo.Numbers(n)
;
if you have that table it's pretty simple:
;WITH phonenumbers
AS
(
SELECT phone,
[range],
CAST(RIGHT(phone,LEN([range])) AS INT) AS number_to_increase,
CAST(LEFT(phone,LEN(phone)-LEN([range])) + REPLICATE('0',LEN([range])) AS BIGINT) AS base_number
FROM PhoneNumbers
)
SELECT p.base_number + num.n
FROM phonenumbers p
INNER JOIN dbo.Numbers num ON num.n BETWEEN p.number_to_increase AND p.[range]
You don't have to use a CTE like here - it's just to see a bit clearer what the idea behind this approach is. Maybe this suits for you
You can use CTE like this:
;WITH CTE (PhoneNumbers, [Range], i) AS (
SELECT CAST(Phone AS bigint), [Range], CAST(1 AS bigint)
FROM yourTable
UNION ALL
SELECT CAST(PhoneNumbers + 1 AS bigint), [Range], i + 1
FROM CTE
WHERE (PhoneNumbers + 1) % 10000 <= [Range]
)
SELECT PhoneNumbers
FROM CTE
ORDER BY PhoneNumbers
Here is one example of using a tally table. In my system I have that set of ctes as a view so I never have to write it again.
if OBJECT_ID('tempdb..#PhoneNumbers') is not null
drop table #PhoneNumbers;
create table #PhoneNumbers
(
Phone char(10)
, Range smallint
)
insert #PhoneNumbers
select 9135678463, 8465 union all
select 3279275678, 5679 union all
select 6372938103, 8105;
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select *
from #PhoneNumbers p
join cteTally t on t.N >= RIGHT(Phone, 4) and t.N <= Range
order by p.Phone
One more approach:
--Creating dummy table
select '9999991234' phone, '1237' rang into #tbl
union
select '9999995689', '5692'
SELECT [phone] low
,(CAST(9999995689/10000 AS bigINT) * 10000 + [Rang]) high
into #tbl1
FROM #tbl
--Creating 'numbrs' to have numbers between 0 & 9999 i.e. max range
select (rn-1)rn
into #numbrs
from
(select row_number() over (partition by null order by A.object_id) rn from sys.objects A
cross join sys.objects B)A
where rn between 0 and 9999
select (low + rn)phn from #numbrs cross join #tbl1
where (low + rn) between low and high

Inserting by joining on CTE not producing random results

I have created a cte that produces a range of numbers.
What I would like to do is join on this table and do an insert with randomly selected values, this is all to produce test data.
The random portion of the insert would look something like this:
(SELECT TOP 1 Name FROM #Titles ORDER BY NEWID())
The problem is, when I do the insert and join on the CTE my results don't seem random at all? It always uses the same value from #Titles? (Also there are other tables too, like #Notes.
WITH MyCte AS
(SELECT MyCounter = 1
UNION ALL
SELECT MyCounter + 1
FROM MyCte
where MyCounter < 100)
INSERT INTO [MyTable]
( Title, Note )
select (SELECT TOP 1 Name FROM #Titles ORDER BY NEWID()),
(SELECT TOP 1 Note FROM #Notes ORDER BY NEWID())
from mycte
The reason why you're getting the same record 100 times is that SELECT TOP 1 Name FROM #Titles ORDER BY NEWID() is executed only once and then returned for each record in mycte.
You can use a good old loop to achieve what you want. You can use that with multiple tables. It doesn't matter how many records you have in your source tables (though you should have at least 1).
declare #count int = 100
while #count > 0
begin
INSERT INTO [MyTable] ( Title )
SELECT TOP 1 Name FROM #Titles ORDER BY NEWID()
INSERT INTO [MyTable] ( Title )
SELECT TOP 1 Note FROM #Notes ORDER BY NEWID()
set #count = #count - 1
end
As Szymon said in his answer that there is no need to use CTE to get random 100 record from a table. So apart from the answer given by Szymon, you can achieve the same using below query.
INSERT INTO [MyTable] ( Title )
SELECT TOP 1 Name FROM #Titles ORDER BY NEWID()
Go 100
To Specify more than one table.
INSERT INTO [MyTable] ( Title, Note )
SELECT (SELECT TOP 1 Name FROM #Titles ORDER BY NEWID()),
(SELECT TOP 1 Name FROM #Notes ORDER BY NEWID())
GO 100
and now check your result in MyTable table.
SELECT * FROM MyTable

SQL Select 'n' records without a Table

Is there a way of selecting a specific number of rows without creating a table. e.g. if i use the following:
SELECT 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
It will give me 10 across, I want 10 New Rows.
Thanks
You can use a recursive CTE to generate an arbitrary sequence of numbers in T-SQL like so:
DECLARE #start INT = 1;
DECLARE #end INT = 10;
WITH numbers AS (
SELECT #start AS number
UNION ALL
SELECT number + 1
FROM numbers
WHERE number < #end
)
SELECT *
FROM numbers
OPTION (MAXRECURSION 0);
If you have a fixed number of rows, you can try:
SELECT 1
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
UNION
SELECT 6
UNION
SELECT 7
UNION
SELECT 8
UNION
SELECT 9
UNION
SELECT 10
This is a good way if you need a long list (so you don't need lots of UNIONstatements:
WITH CTE_Numbers AS (
SELECT n = 1
UNION ALL
SELECT n + 1 FROM CTE_Numbers WHERE n < 10
)
SELECT n FROM CTE_Numbers
The Recursive CTE approach - is realy good.
Be just aware of performance difference. Let's play with a million of records:
Recursive CTE approach. Duration = 14 seconds
declare #start int = 1;
declare #end int = 999999;
with numbers as
(
select #start as number
union all
select number + 1 from numbers where number < #end
)
select * from numbers option(maxrecursion 0);
Union All + Cross Join approach. Duration = 6 seconds
with N(n) as
(
select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1
)
select top 999999
row_number() over(order by (select 1)) as number
from
N n1, N n2, N n3, N n4, N n5, N n6;
Table Value Constructor + Cross Join approach. Duration = 6 seconds
(if SQL Server >= 2008)
with N as
(
select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t(n)
)
select top 999999
row_number() over(order by (select 1)) as number
from
N n1, N n2, N n3, N n4, N n5, N n6;
Recursive CTE + Cross Join approach. :) Duration = 6 seconds
with N(n) as
(
select 1
union all
select n + 1 from N where n < 10
)
select top 999999
row_number() over(order by (select 1)) as number
from
N n1, N n2, N n3, N n4, N n5, N n6;
We will get more amazing effect if we try to INSERT result into a table variable:
INSERT INTO with Recursive CTE approach. Duration = 17 seconds
declare #R table (Id int primary key clustered);
with numbers as
(
select 1 as number
union all
select number + 1 from numbers where number < 999999
)
insert into #R
select * from numbers option(maxrecursion 0);
INSERT INTO with Cross Join approach. Duration = 1 second
declare #C table (Id int primary key clustered);
with N as
(
select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t(n)
)
insert into #C
select top 999999
row_number() over(order by (select 1)) as number
from
N n1, N n2, N n3, N n4, N n5, N n6;
Here is an interesting article about Tally Tables
SELECT 1
UNION
SELECT 2
UNION
...
UNION
SELECT 10 ;
Using spt_values table:
SELECT TOP (1000) n = ROW_NUMBER() OVER (ORDER BY number)
FROM [master]..spt_values ORDER BY n;
Or if the value needed is less than 1k:
SELECT DISTINCT n = number FROM master..[spt_values] WHERE number BETWEEN 1 AND 1000;
This is a table that is used by internal stored procedures for various purposes. Its use online seems to be quite prevalent, even though it is undocumented, unsupported, it may disappear one day, and because it only contains a finite, non-unique, and non-contiguous set of values. There are 2,164 unique and 2,508 total values in SQL Server 2008 R2; in 2012 there are 2,167 unique and 2,515 total. This includes duplicates, negative values, and even if using DISTINCT, plenty of gaps once you get beyond the number 2,048. So the workaround is to use ROW_NUMBER() to generate a contiguous sequence, starting at 1, based on the values in the table.
In addition, to aid more values than 2k records, you could join the table with itself, but in common cases, that table itself is enough.
Performance wise, it shouldn't be too bad (generating a million records, it took 10 seconds on my laptop), and the query is quite easy to read.
Source: http://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
Using PIVOT (for some cases it would be overkill)
DECLARE #Items TABLE(a int, b int, c int, d int, e int);
INSERT INTO #Items
VALUES(1, 2, 3, 4, 5)
SELECT Items
FROM #Items as p
UNPIVOT
(Items FOR Seq IN
([a], [b], [c], [d], [e]) ) AS unpvt
;WITH nums AS
(SELECT 1 AS value
UNION ALL
SELECT value + 1 AS value
FROM nums
WHERE nums.value <= 99)
SELECT *
FROM nums
Using GENERATE_SERIES - SQL Server 2022
Generates a series of numbers within a given interval. The interval and the step between series values are defined by the user.
SELECT value
FROM GENERATE_SERIES(START = 1, STOP = 10);