Permuting values in SQL - sql

Let's say I have a table with two columns:
id | value
----------
1 | 101
2 | 356
3 | 28
I need to randomly permute the value column so that each id is randomly assigned a new value from the existing set {101,356,28}. How could I do this in Oracle SQL?
It may sound odd but this is a real problem, just with more columns.

You can do this by using row_number() with a random number generator and then joining back to the original rows:
with cte as (
select id, value,
row_number() over (order by id) as i,
row_number() over (order by dbms_random.random) as rand_i
from table t
)
select cte.id, cte1.value
from cte join
cte cte1
on cte.i = cte.rand_i;
This guarantees a permutation (i.e. no original row has its value used twice).
EDIT:
By the way, if the original ids are sequential from 1 and have no gaps, you could just do:
select row_number() over (order by dbms.random) as id, value
from table t;

An Option : select * from x_table where id = round(dbms_random.value() * 3) + 1; [Here 3 is the number of rows in your random data table and I am assuming that id is incremental and unique?]
I'll think of other options.

I'm not sure whether this is the right task for SQL database. Maybe you should implement something like this:
Factoradic permutation - in PL/SQL and then return a cursor via PIPE ROW construct. Ordering by dbms.random might be slow for large data sets.

Related

SQL query looping for each value in a list

New to SQL here - I am trying to get 1 row from a table matching to a particular criteria
Typically this would look like
SELECT TOP 1 *
FROM myTable
WHERE id = 'abc'
The output may look like
value id
--------------
1 abc
The table has many entries for an 'id', and I am trying to get one entry per 'id'. Now I have list of 'id's. How would I execute something like
SELECT TOP 1 *
FROM myTable
FOR EACH id
WHERE id IN ('abc', 'edf', 'fgh')
Expecting result like
value id
--------------
1 abc
10 edf
12 fgh
I do not know if it is some sort union or concat operation, but would like to learn. I am working on Azure SQL Server
The table has many entries for an 'id', and I am trying to get one entry per 'id'. Now I have list of 'id's.
A typical method is row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by id) as seqnum
from mytable t
) t
where seqnum = 1;
Note: you can filter on particular ids, if you want. It is unclear if that is really required for your question.
If you happen to be using SQL Server (as select top suggests), you can use the more concise, but somewhat less performant:
select top (1) with ties t.*
from mytable t
order by row_number() over (order by id order by (select null));

How to get nth record in a sql server table without changing the order?(sql server)

for example i have data like this(sql server)
id name
4 anu
3 lohi
1 pras
2 chand
i want 2nd record in a table (means 3 lohi)
if i use row_number() function its changes the order and i get (2 chand)
i want 2nd record from table data
can anyonr please give me the query fro above scenario
There is no such thing as the nth row in a table. And for a simple reason: SQL tables represent unordered sets (technically multi-sets because they allow duplicates).
You can do what you want use offset/fetch:
select t.*
from t
order by id desc
offset 1 fetch first 1 row only;
This assumes that the descending ordering on id is what you want, based on your example data.
You can also do this using row_number():
select t.*
from (select t.*,
row_number() over (order by id desc) as seqnum
from t
) t
where seqnum = 2;
I should note that that SQL Server allows you to assign row_number() without having an effective sort using something like this:
select t.*
from (select t.*,
row_number() over (order by (select NULL)) as seqnum
from t
) t
where seqnum = 2;
However, this returns an arbitrary row. There is no guarantee it returns the same row each time it runs, nor that the row is "second" in any meaningful use of the term.

Select multiple columns having distinct just in 3 of them

i've got a table that i need to return about 14 column values but only return 1 row for the duplicates on some of the columns.
The second problem is that between the duplicates i need to keep the one that has the biggest int in one of the columns that is not required to be unique.
Since the Table is somewhat big, I am seeking advice into doing this in the most efficient way.
should i be doing a group by?
my table is somewhat like this, i will simplify the number of columns.
ID(UniqueIdentifier) | ACCID(UniqueIdentifier) | DateTime(DateTime) | distance(int)|type(int)
28761188-0886-E911-822F-DD1FA635D450 1238FD8A-BD00-411A-A81C-0F6F5C026BCC 2019-06-03 14:04:41.000 2 3
41761188-0886-E911-822F-DD1FA635D450 1238FD8A-BD00-411A-A81C-0F6F5C026BCC 2019-06-03 14:04:41.000 1 3
I should be only selecting when ACCID and DATETIME is unique, the column ID in primary so will never be duplicate, and i need to keep the row with the biggest distance.
You can use the ROW_NUMBER() window function, as in:
select *
from (
select
id,
accid,
datetime,
distance,
type,
row_number() over(partition by accid, datetime order by type desc) as rn
from t
) x
where rn = 1
If you want to show multiple "ties", then replace ROW_NUMBER() by RANK().
I would suggest a correlated subquery with the right index as the fastest method:
select t.*
from t
where t.id = (select top (1) t2.id
from t t2
where t2.ACCID = t.ACCID
order by t2.distance desc
) ;
The best index is on (ACCID, distance desc, id).

How to find first duplicate row in a table sql server

I am working on SQL Server. I have a table, that contains around 75000 records. Among them there are several duplicate records. So i wrote a query to know which record repeated how many times like,
SELECT [RETAILERNAME],COUNT([RETAILERNAME]) as Repeated FROM [Stores] GROUP BY [RETAILERNAME]
It gives me result like,
---------------------------
RETAILERNAME | Repeated
---------------------------
X | 4
---------------------------
Y | 6
---------------------------
Z | 10
---------------------------
Among 4 record(s) of X record, i need take only first record of X.
so here i want to retrieve all fields from first row of duplicate records. i.e. Take all records whose RETAILERNAME='X' we will get some no. of duplicate records, we need to get only first row from them.
Please guide me.
You could try using ROW_NUMBER.
Something like
;WITH Vals AS (
SELECT [RETAILERNAME],
ROW_NUMBER() OVER(PARTITION BY [RETAILERNAME] ORDER BY [RETAILERNAME]) RowID
FROM [Stores ]
)
SELECT *
FROm Vals
WHERE RowID = 1
SQL Fiddle DEMO
You can then also remove the duplicates if need be (BUT BE CAREFUL THIS IS PERMANENT)
;WITH Vals AS (
SELECT [RETAILERNAME],
ROW_NUMBER() OVER(PARTITION BY [RETAILERNAME] ORDER BY [RETAILERNAME]) RowID
FROM Stores
)
DELETE
FROM Vals
WHERE RowID > 1;
You Can write query as under
SELECT TOP 1 * FROM [Stores] GROUP BY [RETAILERNAME]
HAVING your condition
WITH cte
AS (SELECT [retailername],
Row_number()
OVER(
partition BY [retailername]
ORDER BY [retailername])'RowRank'
FROM [retailername])
SELECT *
FROM cte

How can I get a random cartesian product in PostgreSQL?

I have two tables, custassets and tags. To generate some test data I'd like to do an INSERT INTO a many-to-many table with a SELECT that gets random rows from each (so that a random primary key from one table is paired with a random primary key from the second). To my surprise this isn't as easy as I first thought, so I'm persisting with this to teach myself.
Here's my first attempt. I select 10 custassets and 3 tags, but both are the same in each case. I'd be fine with the first table being fixed, but I'd like to randomise the tags assigned.
SELECT
custassets_rand.id custassets_id,
tags_rand.id tags_rand_id
FROM
(
SELECT id FROM custassets WHERE defunct = false ORDER BY RANDOM() LIMIT 10
) AS custassets_rand
,
(
SELECT id FROM tags WHERE defunct = false ORDER BY RANDOM() LIMIT 3
) AS tags_rand
This produces:
custassets_id | tags_rand_id
---------------+--------------
9849 | 3322 }
9849 | 4871 } this pattern of tag PKs is repeated
9849 | 5188 }
12145 | 3322
12145 | 4871
12145 | 5188
17837 | 3322
17837 | 4871
17837 | 5188
....
I then tried the following approach: doing the second RANDOM() call in the SELECT column list. However this one was worse, as it chooses a single tag PK and sticks with it.
SELECT
custassets_rand.id custassets_id,
(SELECT id FROM tags WHERE defunct = false ORDER BY RANDOM() LIMIT 1) tags_rand_id
FROM
(
SELECT id FROM custassets WHERE defunct = false ORDER BY RANDOM() LIMIT 30
) AS custassets_rand
Result:
custassets_id | tags_rand_id
---------------+--------------
16694 | 1537
14204 | 1537
23823 | 1537
34799 | 1537
36388 | 1537
....
This would be easy in a scripting language, and I'm sure can be done quite easily with a stored procedure or temporary table. But can I do it just with a INSERT INTO SELECT?
I did think of choosing integer primary keys using a random function, but unfortunately the primary keys for both tables have gaps in the increment sequences (and so an empty row might be chosen in each table). That would have been fine otherwise!
Note that what you are looking for is not a Cartesian product, which would produce n*m rows; rather a random 1:1 association, which produces GREATEST(n,m) rows.
To produce truly random combinations, it's enough to randomize rn for the bigger set:
SELECT c_id, t_id
FROM (
SELECT id AS c_id, row_number() OVER (ORDER BY random()) AS rn
FROM custassets
) x
JOIN (SELECT id AS t_id, row_number() OVER () AS rn FROM tags) y USING (rn);
If arbitrary combinations are good enough, this is faster (especially for big tables):
SELECT c_id, t_id
FROM (SELECT id AS c_id, row_number() OVER () AS rn FROM custassets) x
JOIN (SELECT id AS t_id, row_number() OVER () AS rn FROM tags) y USING (rn);
If the number of rows in both tables do not match and you do not want to lose rows from the bigger table, use the modulo operator % to join rows from the smaller table multiple times:
SELECT c_id, t_id
FROM (
SELECT id AS c_id, row_number() OVER () AS rn
FROM custassets -- table with fewer rows
) x
JOIN (
SELECT id AS t_id, (row_number() OVER () % small.ct) + 1 AS rn
FROM tags
, (SELECT count(*) AS ct FROM custassets) AS small
) y USING (rn);
Window functions were added with PostgreSQL 8.4.
WITH a_ttl AS (
SELECT count(*) AS ttl FROM custassets c),
b_ttl AS (
SELECT count(*) AS ttl FROM tags),
rows AS (
SELECT gs.*
FROM generate_series(1,
(SELECT max(ttl) AS ttl FROM
(SELECT ttl FROM a_ttl UNION SELECT ttl FROM b_ttl) AS m))
AS gs(row)),
tab_a_rand AS (
SELECT custassets_id, row_number() OVER (order by random()) as row
FROM custassets),
tab_b_rand AS (
SELECT id, row_number() OVER (order by random()) as row
FROM tags)
SELECT a.custassets_id, b.id
FROM rows r
JOIN a_ttl ON 1=1 JOIN b_ttl ON 1=1
LEFT JOIN tab_a_rand a ON a.row = (r.row % a_ttl.ttl)+1
LEFT JOIN tab_b_rand b ON b.row = (r.row % b_ttl.ttl)+1
ORDER BY 1,2;
You can test this query on SQL Fiddle.
Here is a different approach to pick a single combination from 2 tables by random, assuming two tables a and b, both with primary key id. The tables needn't be of same size, and the second row is independently chosen from the first, which might not be that important for testdata.
SELECT * FROM a, b
WHERE a.id = (
SELECT id
FROM a
OFFSET (
SELECT random () * (SELECT count(*) FROM a)
)
LIMIT 1)
AND b.id = (
SELECT id
FROM b
OFFSET (
SELECT random () * (SELECT count(*) FROM b)
)
LIMIT 1);
Tested with two tables, one of size 7000 rows, one with 100k rows, result: immediately. For more than one result, you have to call the query repeatedly - increasing the LIMIT and changing x.id = to x.id IN would produce (aA, aB, bA, bB) result patterns.
It bugs me that after all these years of relational databases, there doesn't seem to be very good cross database ways of doing things like this. The MSDN article http://msdn.microsoft.com/en-us/library/cc441928.aspx seems to have some interesting ideas, but of course that's not PostgreSQL. And even then, their solution requires a single pass, when I'd think it ought to be able to be done without the scan.
I can imagine a few ways that might work without a pass (in selection), but it would involve creating another table that maps your table's primary keys to random numbers (or to linear sequences that you later randomly select, which in some ways may actually be better), and of course, that may have issues as well.
I realize this is probably a non-useful comment, I just felt I needed to rant a bit.
If you just want to get a random set of rows from each side, use a pseudo-random number generator. I would use something like:
select *
from (select a.*, row_number() over (order by NULL) as rownum -- NULL may not work, "(SELECT NULL)" works in MSSQL
from a
) a cross join
(select b.*, row_number() over (order by NULL) as rownum
from b
) b
where a.rownum <= 30 and b.rownum <= 30
This is doing a Cartesian product, which returns 900 rows assuming a and b each have at least 30 rows.
However, I interpreted your question as getting random combinations. Once again, I'd go for the pseudo-random approach.
select *
from (select a.*, row_number() over (order by NULL) as rownum -- NULL may not work, "(SELECT NULL)" works in MSSQL
from a
) a cross join
(select b.*, row_number() over (order by NULL) as rownum
from b
) b
where modf(a.rownum*107+b.rownum*257+17, 101) < <some vaue>
This let's you get combinations among arbitrary rows.
Just a plain carthesian product ON random() appears to work reasonably well. Simple comme bonjour...
-- Cartesian product
-- EXPLAIN ANALYZE
INSERT INTO dirgraph(point_from,point_to,costs)
SELECT p1.the_point , p2.the_point, (1000*random() ) +1
FROM allpoints p1
JOIN allpoints p2 ON random() < 0.002
;