Joining two tables at random - sql

I have two tables, on with main data, and another shorter table with additional data.
I would like to join the rows from the shorter table to some of the rows of the main table, at random. For example:
main table:
id
data
1
apple
2
banana
3
cherry
4
date
5
elderberry
6
fig
secondary table:
id
data
1
accordion
2
banjo
Desired Result:
main
secondary
… ?
accordion
… ?
banjo
I can think of one way to do it, using a lot of pre-processing with CTEs:
WITH
cte1 AS (SELECT data FROM main ORDER BY random() LIMIT 2),
cte2 AS (SELECT row_number() OVER() AS row, data FROM cte1),
cte3 AS (SELECT row_number() OVER () AS row, data FROM secondary)
SELECT *
FROM cte2 JOIN cte3 ON cte2.row=cte3.row;
It works, but is there a more straightforward way of joining two tables at random?
I have attached a fiddle: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=21af08976112c7ac7c18329fa3699b8c&hide=2

A CTE is basically just a re-usable template for a subquery.
So this can be golfcoded to using 2 subqueries.
SELECT m.rn, m.data main_data, s.data secondary_data
FROM (SELECT data, ROW_NUMBER() OVER (ORDER BY random()) rn FROM main) m
JOIN (SELECT data, ROW_NUMBER() OVER (ORDER BY random()) rn FROM secondary) s USING (rn)

I could rewrite it to this:
SELECT *
FROM (SELECT row_number() OVER (ORDER BY random()) as id,
data
FROM main
ORDER BY RANDOM()) m1
JOIN secondary s on s.id = m1.id
dbfiddle
Update: LIMIT is not needed after looking at #LukStorm's version
I assumed that you know which table is shorter so there is only one column with generated id's

Related

Enrich table with data from other table

Currently I have 2 tables: one with client data (CLIENT_TABLE) and one with gift card information (GIFTCARD_TABLE).
The GIFTCARD_TABLE consists of 100 rows and it has 2 columns: Card_Number and Pin_Code.
Now I need to enrich the CLIENT_TABLE (35 rows) with the 2 columns from the GIFTCARD_TABLE, so every client needs one card_number with its corresponding pin_code and it doesn't matter which one (just don't use the same card number & pin_code twice).
Since these tables don't have any keys which I can use, I don't know how I can do this.
Any suggestions how I can tackle this?
Kind regards
If you want to assign the cards truly random you need:
select *
from
( -- random row_numbers
select dt.*,
row_number() over (order by rnd) as rn
from
( -- 35 random clients
select t.*, random(1,1000000000) as rnd
from CLIENT_TABLE as t
sample randomized allocation 35
) as dt
) as client
join
( -- random row_numbers
select dt.*,
row_number() over (order by rnd) as rn
from
(
select t.*, random(1,1000000000) as rnd
from GIFTCARD_TABLE as t
) as dt
) as card
on client.rn = card.rn
RANDOM can't be used directly in ROW_NUMBER.
Schematically (I may be incorrect in Teradata syntax):
UPDATE
-- this table copy will be updated
CLIENT_TABLE c1
-- this CTE enumerates clients, join enumerates rows in 1st copy
JOIN ( SELECT id, ROW_NUMBER() OVER (ORDER BY id) rn
FROM CLIENT_TABLE ) c2 ON c1.id = c2.id
-- this CTE enumerates cards, join assigns card to client one-to-one by the `rn` number
JOIN ( SELECT *, ROW_NUMBER() OVER (ORDER BY Card_Number) rn
FROM GIFTCARD_TABLE ) g ON c2.rn = g.rn
SET c1.Card_Number = g.Card_Number,
c1.Pin_Code = g.Pin_Code;

Sql Query to return 3 rows for every entry with the same value in column?

OK, I'm a little outta practice on SQL Queries here, I have a table with thousands of entries.
Each row has a unique Id but there is a column named EquipmentId which is not unique and would be present in several rows. I want to return 3 rows for every EquipmentId and if there is less the than 3 entries for an EquipmentID I want those too. ..... make sense ? thanks in advance.
Use ROW_NUMBER() + CTE
;WITH CTE AS(
SELECT *,
ROW_NUMBER() OVER ( PARTITION BY EquipmentId ORDER BY ID ) RN
FROM TableName
)
SELECT *
FROM CTE
WHERE RN <= 3
ORDER BY EquipmentId
Using subqueries you can do it like this:
SELECT *
FROM
(SELECT *, Rank()
OVER
(PARTITION BY equipmentid
ORDER BY ID) Rank
FROM stack) AS a
WHERE
rn <= 3

Return two rows from SQL table with a difference in values [duplicate]

This question already has answers here:
How to request a random row in SQL?
(30 answers)
Closed 6 years ago.
iam trying to return 2 rows from table that have a difference in values, not being an SQL wise man i am stuck any help would be appreciated :-)
TABLE A:
NAME DATA
Oscar HOME1
Jens HOME2
Will HOME1
Jeremy HOME2
Al HOME1
Result, should be 2 random rows with a difference in DATA value
NAME DATA
Oscar HOME1
Jeremy HOME2
Anyone?
Easy way to have random data.
;with tblA as (
select name,data,
row_number() over(partition by data order by newid()) rn
from A
)
select name,data
from tblA
where rn = 1
Couuld be you need
select * from my_table a
inner join my_table b on a.data !=b.data
where a.data in ( SELECT data FROM my_table ORDER BY RAND() LIMIT 1);
For your code
SELECT *
FROM [dbo].[ComputerState] as a
INNER JOIN [dbo].[ComputerState] as b ON a.ServiceName != b.ServiceName
WHERE a.ServiceName IN (
SELECT top 1 [ServiceName] FROM [dbo].[ComputerState]
);
If the question is really this simple, you can use an aggregate such as MAX() or MIN() to grab one row for each different DATA:
SELECT MAX(NAME), DATA
FROM TABLE_A
GROUP BY DATA
Of course, if any other variables are introduced to the requirements, this may no longer work.
;WITH cteA AS (
SELECT
name
,data
,ROW_NUMBER() OVER (PARTITION BY data ORDER BY NEWID()) as DataRowNumber
,ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY NEWID()) as RandomRowNumber
FROM
A
)
SELECT *
FROM
cteA
WHERe
DataRowNumber = 1
AND RandomRowNumber <= 2
This Expands on #AlexKudryashev 's answer a little.
;with tblA as (
select name,data,
row_number() over(partition by data order by newid()) rn
from A
)
select name,data
from tblA
where rn = 1
The only issue with what he had Is that the number of Rows where rn = 1 will be depended on the COUNT(DISTINCT data) so it could lead to more than 2 results. To fix one could add a SELECT TOP 2 clause but it might not be fully random as results at that point as it will be dependent on the ordinal results of how SQL optimizes the query which is likely to be consistent. To get truly random add a second random row number and limit the results to the top 2 of those.

how to join two tables in sql server with out duplication

Hi I have two tables A and B
Table A:
Order Pick up
100 Toronto
100 Mississauga
100 Scarborough
Table B
Order Drop off
100 Oakvile
100 Hamilton
100 Milton
Please let me know how can I can get this output (ie I just want to join the fields from in B in right hand side of A)
Order pickup Dropoff
100 Toronto oakvile
100 Mississauga Hamilton
100 Scarborough Milton
How can I write query for the same I try to join a.rownum = b.rownum but no luck.
As OP has not mention any RDBMS
I am taking the liberty for taking SQL SERVER 2008 as his RDBMS. If OP wants the following Query can be converted to any other RDBMS easily.
select A.[Order],
ROW_NUMBER() OVER(ORDER BY A.[Pick up]) rn1,
A.[Pick up]
into A1
FROM A
;
select B.[Order],
ROW_NUMBER() OVER(ORDER BY B.[Drop off]) rn2,
B.[Drop off]
into B1
FROM B
;
Select A1.[Order],
A1.[Pick up],
B1.[Drop off]
FROM A1
INNER JOIN B1 on A1.rn1=B1.rn2
SQL FIDDLE to Test
From the use rownum, I'm presuming that you are using Oracle. You can attempt the following:
select a.Order as "order", a.Pickup, b.DropOff
from (select a.*, rownum as seqnum
from a
) a join
(select b.*, rownum as seqnum
from b
) b
on a.order = b.order and a.seqnum = b.seqnum;
(This assumes that all orders match up exactly.)
I must emphasize that although this might seem to work (and it should work on small examples), it will not work in general. And, it will not work on data that has deleted records. And, it probably won't work on parallel systems. If you have a small amount of data, I'd suggest dumping it in Excel and doing the work there -- that way, you can see if the pairs make sense.
Also, if you do have a column that specifies the ordering, then basically the same structure will work:
select coalesce(a.Order, b.Order) as "order", a.Pickup, b.DropOff
from (select a.*,
row_number() over (partition by "order" order by <ordering field>) as seqnum
from a
) a join
(select b.*,
row_number() over (partition by "order" order by <ordering field>) as seqnum
from b
) b
on a.order = b.order and a.seqnum = b.seqnum;
I'd use a CTE along with the ROW_NUMBER windowing function.
WITH keyed_A AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id
,[Order]
,[Pick Up]
FROM A
), keyed_B AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id
,[Order]
,[Drop Off]
FROM B
)
SELECT
a.[Pick Up]
,b.[Drop Off]
FROM keyed_A AS a
INNER JOIN keyed_B AS b
ON a.id = b.id
;
The CTE can be thought of as a virtual table with an id that crosses the two tables. The OVER clause with the Windowing function ROW_NUMBER can be used to create an id in the CTE. Since we are relying on the physical storage of the records (not a good idea, please add keys to the tables) we can ORDER BY (SELECT NULL) which means just use the order in will be read in.
SQLFiddle to test

Select independent distinct with one query

I need to select distinct values from multiple columns in an h2 database so I can have a list of suggestions for the user based on what is in the database. In other words, I need something like
SELECT DISTINCT a FROM table
SELECT DISTINCT b FROM table
SELECT DISTINCT c FROM table
in one query. In-case I am not clear enough, I want a query that given this table (columns ID, thing, other, stuff)
0 a 5 p
1 b 5 p
2 a 6 p
3 c 5 p
would result in something like this
a 5 p
b 6 -
c - -
where '-' is an empty entry.
This is a bit complicated, but you can do it as follows:
select max(thing) as thing, max(other) as other, max(stuff) as stuff
from ((select row_number() over (order by id) as seqnum, thing, NULL as other, NULL as stuff
from (select thing, min(id) as id from t group by thing
) t
) union all
(select row_number() over (order by id) as seqnum, NULL, other, NULL
from (select other, min(id) as id from t group by other
) t
) union all
(select row_number() over (order by id) as seqnum, NULL, NULL, stuff
from (select stuff, min(id) as id from t group by stuff
) t
)
) t
group by seqnum
What this does is assign a sequence number to each distinct value in each column. It then combines these together into a single row for each sequence number. The combination uses the union all/group by approach. An alternative formulation uses full outer join.
This version uses the id column to keep the values in the same order as they appear in the original data.
In H2 (which was not originally on the question), you can use the rownum() function instead (documented here). You may not be able to specify the ordering however.