SQL: how to get random number of rows from one table for each row in another

SQL: how to get random number of rows from one table for each row in another - sql

I have two tables where the data is not related
For each row in table A i want e.g. 3 random rows in table B
This is fairly easy using a cursor, but it is awfully slow
So how can i express this in single statement to avoid RBAR ?

To get a random number between 0 and (N-1), you can use.
abs(checksum(newid())) % N
Which means to get positive values 1-N, you use
1 + abs(checksum(newid())) % N
Note: RAND() doesn't work - it is evaluated once per query batch and you get stuck with the same value for all rows of tableA.
The query:
SELECT *
FROM tableA A
JOIN (select *, rn=row_number() over (order by newid())
from tableB) B ON B.rn <= 1 + abs(checksum(newid())) % 9
(assuming you wanted up to 9 random rows of B per A)

assuming tableB has integer surrogate key, try
Declare #maxRecs integer = 11 -- Maximum number of b records per a record
Select a.*, b.*
From tableA a Join tableB b
On b.PKColumn % (floor(Rand() * #maxRecs)) = 0

If you have a fixed number that you know in advance (such as 3), then:
select a.*, b.*
from a cross join
(select top 3 * from b) b
If you want a random number of rows from "b" for each row in "a", the problem is a bit harder in SQL Server.

Heres an example of how this could be done, code is self contained, copy and press F5 ;)
-- create two tables we can join
DECLARE #datatable TABLE(ID INT)
DECLARE #randomtable TABLE(ID INT)
-- add some dummy data
DECLARE #i INT = 1
WHILE(#i < 3) BEGIN
INSERT INTO #datatable (ID) VALUES (#i)
SET #i = #i + 1
END
SET #i = 1
WHILE(#i < 100) BEGIN
INSERT INTO #randomtable (ID) VALUES (#i)
SET #i = #i + 1
END
--The key here being the ORDER BY newid() which makes sure that
--the TOP 3 is different every time
SELECT
d.ID AS DataID
,rtable.ID RandomRow
FROM #datatable d
LEFT JOIN (SELECT TOP 3 * FROM #randomtable ORDER BY newid()) as rtable ON 1 = 1
Heres an example of the output

Related

SQL Server - loop through table and update based on count

I have a SQL Server database. I need to loop through a table to get the count of each value in the column 'RevID'. Each value should only be in the table a certain number of times - for example 125 times. If the count of the value is greater than 125 or less than 125, I need to update the column to ensure all values in the RevID (are over 25 different values) is within the same range of 125 (ok to be a few numbers off)
For example, the count of RevID = "A2" is = 45 and the count of RevID = 'B2' is = 165 then I need to update RevID so the 45 count increases and the 165 decreases until they are within the 125 range.
This is what I have so far:
DECLARE #i INT = 1,
#RevCnt INT = SELECT RevId, COUNT(RevId) FROM MyTable group by RevId
WHILE(#RevCnt >= 50)
BEGIN
UPDATE MyTable
SET RevID= (SELECT COUNT(RevID) FROM MyTable)
WHERE RevID < 50)
#i = #i + 1
END
I have also played around with a cursor and instead of trigger. Any idea on how to achieve this? Thanks for any input.

Okay I cam back to this because I found it interesting even though clearly there are some business rules/discussion that you and I and others are not seeing. anyway, if you want to evenly and distribute arbitrarily there are a few ways you could do it by building recursive Common Table Expressions [CTE] or by building temp tables and more. Anyway here is a way that I decided to give it a try, I did utilize 1 temp table because sql was throwing in a little inconsistency with the main logic table as a cte about every 10th time but the temp table seems to have cleared that up. Anyway, this will evenly spread RevId arbitrarily and randomly assigning any remainder (# of Records / # of RevIds) to one of the RevIds. This script also doesn't rely on having a UniqueID or anything it works dynamically over row numbers it creates..... here you go just subtract out test data etc and you have what you more than likely want. Though rebuilding the table/values would probably be easier.
--Build Some Test Data
DECLARE #Table AS TABLE (RevId VARCHAR(10))
DECLARE #C AS INT = 1
WHILE #C <= 400
BEGIN
IF #C <= 200
BEGIN
INSERT INTO #Table (RevId) VALUES ('A1')
END
IF #c <= 170
BEGIN
INSERT INTO #Table (RevId) VALUES ('B2')
END
IF #c <= 100
BEGIN
INSERT INTO #Table (RevId) VALUES ('C3')
END
IF #c <= 400
BEGIN
INSERT INTO #Table (RevId) VALUES ('D4')
END
IF #c <= 1
BEGIN
INSERT INTO #Table (RevId) VALUES ('E5')
END
SET #C = #C+ 1
END
--save starting counts of test data to temp table to compare with later
IF OBJECT_ID('tempdb..#StartingCounts') IS NOT NULL
BEGIN
DROP TABLE #StartingCounts
END
SELECT
RevId
,COUNT(*) as Occurences
INTO #StartingCounts
FROM
#Table
GROUP BY
RevId
ORDER BY
RevId
/************************ This is the main method **********************************/
--clear temp table that is the main processing logic
IF OBJECT_ID('tempdb..#RowNumsToChange') IS NOT NULL
BEGIN
DROP TABLE #RowNumsToChange
END
--figure out how many records there are and how many there should be for each RevId
;WITH cteTargetNumbers AS (
SELECT
RevId
--,COUNT(*) as RevIdCount
--,SUM(COUNT(*)) OVER (PARTITION BY 1) / COUNT(*) OVER (PARTITION BY 1) +
--CASE
--WHEN ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY NEWID()) <=
--SUM(COUNT(*)) OVER (PARTITION BY 1) % COUNT(*) OVER (PARTITION BY 1)
--THEN 1
--ELSE 0
--END as TargetNumOfRecords
,SUM(COUNT(*)) OVER (PARTITION BY 1) / COUNT(*) OVER (PARTITION BY 1) +
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY NEWID()) <=
SUM(COUNT(*)) OVER (PARTITION BY 1) % COUNT(*) OVER (PARTITION BY 1)
THEN 1
ELSE 0
END - COUNT(*) AS NumRecordsToUpdate
FROM
#Table
GROUP BY
RevId
)
, cteEndRowNumsToChange AS (
SELECT *
,SUM(CASE WHEN NumRecordsToUpdate > 1 THEN NumRecordsToUpdate ELSE 0 END)
OVER (PARTITION BY 1 ORDER BY RevId) AS ChangeEndRowNum
FROM
cteTargetNumbers
)
SELECT
*
,LAG(ChangeEndRowNum,1,0) OVER (PARTITION BY 1 ORDER BY RevId) as ChangeStartRowNum
INTO #RowNumsToChange
FROM
cteEndRowNumsToChange
;WITH cteOriginalTableRowNum AS (
SELECT
RevId
,ROW_NUMBER() OVER (PARTITION BY RevId ORDER BY (SELECT 0)) as RowNumByRevId
FROM
#Table t
)
, cteRecordsAllowedToChange AS (
SELECT
o.RevId
,o.RowNumByRevId
,ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT 0)) as ChangeRowNum
FROM
cteOriginalTableRowNum o
INNER JOIN #RowNumsToChange t
ON o.RevId = t.RevId
AND t.NumRecordsToUpdate < 0
AND o.RowNumByRevId <= ABS(t.NumRecordsToUpdate)
)
UPDATE o
SET RevId = u.RevId
FROM
cteOriginalTableRowNum o
INNER JOIN cteRecordsAllowedToChange c
ON o.RevId = c.RevId
AND o.RowNumByRevId = c.RowNumByRevId
INNER JOIN #RowNumsToChange u
ON c.ChangeRowNum > u.ChangeStartRowNum
AND c.ChangeRowNum <= u.ChangeEndRowNum
AND u.NumRecordsToUpdate > 0
IF OBJECT_ID('tempdb..#RowNumsToChange') IS NOT NULL
BEGIN
DROP TABLE #RowNumsToChange
END
/***************************** End of Main Method *******************************/
-- Compare the results and clean up
;WITH ctePostUpdateResults AS (
SELECT
RevId
,COUNT(*) as AfterChangeOccurences
FROM
#Table
GROUP BY
RevId
)
SELECT *
FROM
#StartingCounts s
INNER JOIN ctePostUpdateResults r
ON s.RevId = r.RevId
ORDER BY
s.RevId
IF OBJECT_ID('tempdb..#StartingCounts') IS NOT NULL
BEGIN
DROP TABLE #StartingCounts
END

Since you've given no rules for how you'd like the balance to operate we're left to speculate. Here's an approach that would find the most overrepresented value and then find an underrepresented value that can take on the entire overage.
I have no idea how optimal this is and it will probably run in an infinite loop without more logic.
declare #balance int = 125;
declare #cnt_over int;
declare #cnt_under int;
declare #revID_overrepresented varchar(32);
declare #revID_underrepresented varchar(32);
declare #rowcount int = 1;
while #rowcount > 0
begin
select top 1 #revID_overrepresented = RevID, #cnt_over = count(*)
from T
group by RevID
having count(*) > #balance
order by count(*) desc
select top 1 #revID_underrepresented = RevID, #cnt_under = count(*)
from T
group by RevID
having count(*) < #balance - #cnt_over
order by count(*) desc
update top #cnt_over - #balance T
set RevId = #revID_underrepresented
where RevId = #revID_overrepresented;
set #rowcount = ##rowcount;
end

The problem is I don't even know what you mean by balance...You say it needs to be evenly represented but it seems like you want it to be 125. 125 is not "even", it is just 125.
I can't tell what you are trying to do, but I'm guessing this is not really an SQL problem. But you can use SQL to help. Here is some helpful SQL for you. You can use this in your language of choice to solve the problem.
Find the rev values and their counts:
SELECT RevID, COUNT(*)
FROM MyTable
GROUP BY MyTable
Update #X rows (with RevID of value #RevID) to a new value #NewValue
UPDATE TOP #X FROM MyTable
SET RevID = #NewValue
WHERE RevID = #RevID
Using these two queries you should be able to apply your business rules (which you never specified) in a loop or whatever to change the data.

Populating an empty table with sequential numbers

I have a table which is already truncated (Microsoft SQL 2008). I have to now populate it with sequential numbers up to 50,000 records arbitrary numbers (doesn't mater) up to 7 characters.
Can any one help as to what SQL statement I need to write that will automatically populate the newly empty table with A000001,A0000002,A0000003, etc so that I can sort number the records within the table.
I have approximately 50000 records which I need to sequentially entered and I really don't want to number the column manually via hand editing.
Thanks in advance.

I'd use excel to generate your unique ids using the following:
In A column:
=CONCATENATE($C2, TEXT($B2,"000000"))
In B column put a 1 in the first row and the following code in all subsequent rows:
=SUM($B4 + 1)
In C column:
The letter A
Then just import the excel csv as a table and you'll have all your ids ready to insert into your empty table.

The SQL below loads a table variable up. Just select from it and insert the data into the new table. Certainly not the model of efficiency, but it'll get the job done.
DECLARE #tmp TABLE(
Value NVARCHAR(10)
)
DECLARE #Counter INT=0
DECLARE #Padding NVARCHAR(20)
WHILE #Counter<50000
BEGIN
SET #Counter=#Counter+1
SET #Padding=
CASE LEN(CONVERT(NVARCHAR,#Counter))
WHEN 1 THEN '00000'
WHEN 2 THEN '0000'
WHEN 3 THEN '000'
WHEN 4 THEN '00'
WHEN 5 THEN '0'
ELSE ''
END
INSERT INTO #tmp SELECT 'A' + #Padding + CONVERT(NVARCHAR,#Counter)
END
select * from #tmp

Use Stacked CTE to generate sequential Numbers
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e2 CROSS JOIN e2 AS b), -- 100*100
e4(n) AS (SELECT 1 FROM e3 CROSS JOIN (SELECT TOP 5 n FROM e1) AS b) -- 5*10000
SELECT n = 'A'+right('000000'+
convert(varchar(20),ROW_NUMBER() OVER (ORDER BY n)),7)
FROM e4 ORDER BY n;
Check here for more methods to generate sequential numbers with performance analysis

Use a table with an identity column and populate it. Then update that table to set the alpha value you need as follows:
create table MyTable (
ID int not null identity(1,1),
Alpha varchar(30)
)
truncate table MyTable
begin tran -- makes it run much faster
declare #i int
select #i = 1
while #i < 1000000
begin
insert into MyTable (Alpha) values ('')
select #i = #i + 1
end
commit
update MyTable set Alpha = 'A' + replicate('0', 6 - len(cast(ID as varchar(30)))) + cast(ID as varchar(30))

Fastest method to insert numbers into sql server table?

SELECT TOP 1000000 row_number() over(ORDER by sv.number) AS num
INTO numbertest
from master..spt_values sv CROSS JOIN master..spt_values sv2
SELECT TOP 1000000 IDENTITY(int,1,1) AS Number
INTO NumberTest
FROM master..spt_values sv1
CROSS JOIN master..spt_values s2
I've come across two methods to insert 1 to 1000000 numbers in a table which works perfectly but doesn't insert 1 to 1000000 sequentially? how can i insert sequentially with fast insertion rate?

I have table Numbers in my database that i fill with the following query.
CREATE TABLE [dbo].[NUMBERS] (
[number] INT IDENTITY (1, 1) NOT NULL
);
set Identity_insert dbo.Numbers oN
declare
#row_count int,
#target_rows int
set #target_rows = 1048576
set #row_count = null
while ( 1 = 1 ) begin
if ( #row_count is null ) begin
insert into Numbers ( [number] ) values ( 1 )
end
else begin
insert into Numbers ( [number] )
select [number] = [number] + #row_count
from Numbers
end
set #row_count = isnull( #row_count, 0 ) + ##rowcount
if ( #row_count >= #target_rows ) begin
break
end
end

what I understood that you need to add/insert rows with serial number 1 to 10,00,000 (ten lac rows)
your two command seems select command instead of insert command
do really required to add serial number field in one of your filed
or
you can run query to any existing table to get result with serial number like
for eg, table name : employee
filed name : name
Note : no fileds existing like SerialNumber
you can run a command to get output with serial number
e.g. select ROW_NUMBER() over (order by employee.name) as SerialNumber, Employee.name from employee
your result will be like
SerialNumber name
1 abc
2 xyz

Sequential numbers randomly selected and added to table

The SO Question has lead me to the following question.
If a table has 16 rows I'd like to add a field to the table with the numbers 1,2,3,4,5,...,16 arranged randomly i.e in the 'RndVal' field for row 1 this could be 2, then for row 2 it could be 5 i.e each of the 16 integers needs to appear once without repetition.
Why doesn't the following work? Ideally I'd like to see this working then to see alternative solutions.
This creates the table ok:
IF OBJECT_ID('tempdb..#A') IS NOT NULL BEGIN DROP TABLE #A END
IF OBJECT_ID('tempdb..#B') IS NOT NULL BEGIN DROP TABLE #B END
IF OBJECT_ID('tempdb..#C') IS NOT NULL BEGIN DROP TABLE #C END
IF OBJECT_ID('tempdb..#myTable') IS NOT NULL BEGIN DROP TABLE #myTable END
CREATE TABLE #B (B_ID INT)
CREATE TABLE #C (C_ID INT)
INSERT INTO #B(B_ID) VALUES
(10),
(20),
(30),
(40)
INSERT INTO #C(C_ID)VALUES
(1),
(2),
(3),
(4)
CREATE TABLE #A
(
B_ID INT
, C_ID INT
, RndVal INT
)
INSERT INTO #A(B_ID, C_ID, RndVal)
SELECT
#B.B_ID
, #C.C_ID
, 0
FROM #B CROSS JOIN #C;
Then I'm attempting to add the random column using the following. The logic is to add random numbers between 1 and 16 > then to effectively overwrite any that are duplicated with other numbers > in a loop ...
SELECT
ROW_NUMBER() OVER(ORDER BY B_ID) AS Row
, B_ID
, C_ID
, RndVal
INTO #myTable
FROM #A
DECLARE #rowsRequired INT = (SELECT COUNT(*) CNT FROM #myTable)
DECLARE #i INT = (SELECT #rowsRequired - SUM(CASE WHEN RndVal > 0 THEN 1 ELSE 0 END) FROM #myTable)--0
DECLARE #end INT = 1
WHILE #end > 0
BEGIN
SELECT #i = #rowsRequired - SUM(CASE WHEN RndVal > 0 THEN 1 ELSE 0 END) FROM #myTable
WHILE #i>0
BEGIN
UPDATE x
SET x.RndVal = FLOOR(RAND()*#rowsRequired)
FROM #myTable x
WHERE x.RndVal = 0
SET #i = #i-1
END
--this is to remove possible duplicates
UPDATE c
SET c.RndVal = 0
FROM
#myTable c
INNER JOIN
(
SELECT RndVal
FROM #myTable
GROUP BY RndVal
HAVING COUNT(RndVal)>1
) t
ON
c.RndVal = t.RndVal
SET #end = ##ROWCOUNT
END
TRUNCATE TABLE #A
INSERT INTO #A
SELECT
B_ID
, C_ID
, RndVal
FROM #myTable
If the original table has 6 rows then the result should end up something like this
B_ID|C_ID|RndVal
----------------
| | 5
| | 4
| | 1
| | 6
| | 3
| | 2

I don't understand your code, frankly
This will update each row with a random number, non-repeated number between 1 and the number of rows in the table
UPDATE T
SET SomeCol = T2.X
FROM
MyTable T
JOIN
(
SELECT
KeyCol, ROW_NUMBER() OVER (ORDER BY NEWID()) AS X
FROM
MyTable
) T2 ON T.KeyCol = T2.KeyCol
This is more concise but can't test to see if it works as expected
UPDATE T
SET SomeCol = X
FROM
(
SELECT
SomeCol, ROW_NUMBER() OVER (ORDER BY NEWID()) AS X
FROM
MyTable
) T

When you add TOP (1) (because you need to update first RndVal=0 record) and +1 (because otherwise your zero mark means nothing) to your update, things will start to move. But extremely slowly (around 40 seconds on my rather outdated laptop). This is because, as #myTable gets filled with generated random numbers, it becomes less and less probable to get missing numbers - you usually get duplicate, and have to start again.
UPDATE top (1) x
SET x.RndVal = FLOOR(RAND()*#rowsRequired) + 1
FROM #myTable x
WHERE x.RndVal = 0
Of course, #gbn has perfectly valid solution.

This is basically the same as the previous answer, but specific to your code:
;WITH CTE As
(
SELECT B_ID, C_ID, RndVal,
ROW_NUMBER() OVER(ORDER BY NewID()) As NewOrder
FROM #A
)
UPDATE CTE
SET RndVal = NewOrder
SELECT * FROM #A ORDER BY RndVal

Selecting rows that don't exist physically in the database

I've totally rewritten my question because the simplicity of the previous one people were taking too literally.
The aim:
INSERT INTO X
SELECT TOP 23452345 NEWID()
This query should insert 23452345 GUIDs to the "X" table. actually 23452345 means just any possible number that is entered by user and stored in database.
So the problem is that inserting rows to a database by using
INSERT INTO ... SELECT ...
statement requires you to already have the required amount of rows inserted to database.
Naturally you can emulate the existence of rows by using temporary data and cross joining it but this (in my stupid opinion) creates more results than needed and in some extreme situations might fail due to many unpredicted reasons. I need to be sure that if user entered extremely huge number like 2^32 or even bigger the system will work and behave normally without any possible side effects like extreme memory/time consumption etc...

In all fairness I derived the idea from this site.
;WITH cte AS
(
SELECT 1 x
UNION ALL
SELECT x + 1
FROM cte
WHERE x < 100
)
SELECT NEWID()
FROM cte
EDIT:
The general method we're seeing is to select from a table that has the desired number of rows. It's hackish, but you can create a table, insert the desired number of records, and select from it.
create table #num
(
num int
)
declare #i int
set #i = 1
while (#i <= 77777)
begin
insert into #num values (#i)
set #i = #i + 1
end
select NEWID() from #num
drop table #num

Of course creating a Number table is the best approach and will come in handy. You should definitely have one at your disposal. If you need something as a one-off just join to a known table. I usually use a system table such as spt_values:
declare #result table (id uniqueidentifier)
declare #sDate datetime
set #sDate = getdate();
;with num (n)
as ( select top(777777) row_number() over(order by t1.number) as N
from master..spt_values t1
cross join master..spt_values t2
)
insert into #result(id)
select newid()
from num;
select datediff(ms, #sDate, getdate()) [elasped]

I'd create an integers table and use it. This type of table comes in handy many situations.
CREATE TABLE dbo.Integers
(
i INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
)
WHILE COALESCE(SCOPE_IDENTITY(), 0) <= 100000 /* or some other large value */
BEGIN
INSERT dbo.Integers DEFAULT VALUES
END
Then all you need to do it:
SELECT NEWID()
FROM Integers
WHERE i <= 77777

Try this:
with
L0 as (select 1 as C union all select 1) --2 rows
,L1 as (select 1 as C from L0 as A, L0 as B) --4 rows
,L2 as (select 1 as C from L1 as A, L1 as B) --16 rows
,L3 as (select 1 as C from L2 as A, L2 as B) --256 rows
select top 100 newid() from L3

SELECT TOP 100 NEWID() from sys.all_columns
Or any other datasource that has a large number of records. You can build your own table for 'counting' functionality as such, you can use it in lieu of while loops.
Tally tables: http://www.sqlservercentral.com/articles/T-SQL/62867

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: how to get random number of rows from one table for each row in another - sql

I have two tables where the data is not related For each row in table A i want e.g. 3 random rows in table B This is fairly easy using a cursor, but it is awfully slow So how can i express this in single statement to avoid RBAR ?

assuming tableB has integer surrogate key, try Declare #maxRecs integer = 11 -- Maximum number of b records per a record Select a., b. From tableA a Join tableB b On b.PKColumn % (floor(Rand() * #maxRecs)) = 0

If you have a fixed number that you know in advance (such as 3), then: select a., b. from a cross join (select top 3 * from b) b If you want a random number of rows from "b" for each row in "a", the problem is a bit harder in SQL Server.

Related

SQL Server - loop through table and update based on count

Populating an empty table with sequential numbers

Fastest method to insert numbers into sql server table?

Sequential numbers randomly selected and added to table

Selecting rows that don't exist physically in the database

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: how to get random number of rows from one table for each row in another - sql

I have two tables where the data is not related For each row in table A i want e.g. 3 random rows in table B This is fairly easy using a cursor, but it is awfully slow So how can i express this in single statement to avoid RBAR ?

assuming tableB has integer surrogate key, try Declare #maxRecs integer = 11 -- Maximum number of b records per a record Select a.*, b.* From tableA a Join tableB b On b.PKColumn % (floor(Rand() * #maxRecs)) = 0

If you have a fixed number that you know in advance (such as 3), then: select a.*, b.* from a cross join (select top 3 * from b) b If you want a random number of rows from "b" for each row in "a", the problem is a bit harder in SQL Server.

Related

SQL Server - loop through table and update based on count

Populating an empty table with sequential numbers

Fastest method to insert numbers into sql server table?

Sequential numbers randomly selected and added to table

Selecting rows that don't exist physically in the database

Categories

Resources

assuming tableB has integer surrogate key, try Declare #maxRecs integer = 11 -- Maximum number of b records per a record Select a., b. From tableA a Join tableB b On b.PKColumn % (floor(Rand() * #maxRecs)) = 0

If you have a fixed number that you know in advance (such as 3), then: select a., b. from a cross join (select top 3 * from b) b If you want a random number of rows from "b" for each row in "a", the problem is a bit harder in SQL Server.