Assign a random order to each group - sql

I want to expand each row in TableA into 4 rows. The result hold all the columns from TableA and two additional columns: SetID = ranging from 0 to 3 and unique when grouped by TableA. Random = a random permutation of SetID within the same grouping.
I use SQLite and would prefer a pure SQL solution.
Table A:
Description
-----------
A
B
Desired output:
Description | SetID | Random
------------|-------|-------
A | 0 | 2
A | 1 | 0
A | 2 | 3
A | 3 | 1
B | 0 | 3
B | 1 | 2
B | 2 | 0
B | 3 | 1
My attempt so far solves creating 4 rows for each row in TableA but doesn't get the permutation correctly. wrong will contain a random number ranging from 0 to 3. I need exactly one 0, 1, 2 and 3 for each unique value in Description and their order should be random.
SELECT
Description,
SetID,
abs(random()) % 4 AS wrong
FROM
TableA
LEFT JOIN
TableB
ON
1 = 1
Table B:
SetID
-----
0
1
2
3

Use a cross join
SELECT Description,
SetID,
abs(random()) % 4 AS wrong
FROM TableA
CROSS JOIN TableB

Consider a solution in your specialty, R. As you know, R maintains excellent database packages, one of which is RSQLite. Additionally, R can run commands via the connection without the need to import very large datasets.
Your solution is essentially a random sampling without replacement. Simply have R run the sampling and concatenate list items into an SQL string.
Below creates a table in the SQLite database where R sends the CREATE TABLE command to the SQL engine. No import or export of data. Should you need to run every four rows, run an iterative loop in a defined function that outputs the sql string. For append queries change the CREATE TABLE AS to INSERT INTO ... SELECT statement.
library(RSQLite)
sqlite <- dbDriver("SQLite")
conn <- dbConnect(sqlite,"C:\\Path\\To\\Database\\File\\newexample.db")
# SAMPLE WITHOUT REPLACEMENT
randomnums <- as.list(sample(0:3, 4, replace=F))
# SQL CONCATENATION
sql <- sprintf("CREATE TABLE PermutationsTable AS
SELECT a.Description, b.SetID,
(select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=0
union select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=1
union select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=2
union select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=3)
As RandomNumber
from TableA a, TableB b;",
randomnums[[1]], randomnums[[2]],
randomnums[[3]], randomnums[[4]])
# RUN QUERY
dbSendQuery(conn, sql)
dbDisconnect(conn)
You will notice a nested union subquery. This is used to achieve the inline random numbers for each row. Also, to return all possible combinations from all tables, no join statements are needed, simply list tables in FROM clause.

Related

How I can improve SQL gaps searching code for long tables?

How I can improve (speed up) my code for long tables (1M rows)?
I have a table named names. The data in id's column is 1, 2, 5, 7.
ID | NAME
1 | Homer
2 | Bart
5 | March
7 | Lisa
I need to find the missing sequence numbers from the table.
My SQL query found the missing sequence numbers from my table.
It is similar with problem asked here. But my solution is different. I am expecting results like:
id
----
3
4
6
(3 rows)
so, my code (for postgreSql):
SELECT series AS id
FROM generate_series(1, (SELECT ID FROM names ORDER BY ID DESC LIMIT 1), 1)
series LEFT JOIN names ON series = names.id
WHERE id IS NULL;
Use max(id) to get the biggest one
Result here
SELECT series AS id
FROM generate_series(1, (select max(id) from names), 1)
series LEFT JOIN names ON series = names.id
WHERE id IS NULL;

How to get unique values in the self join and how to get LIMIT number dynamically in psql

Hi i am just learning databases and practicing my skills on the table shown below
id | name | wins | matches
-----+-------------------+------+---------
205 | Twilight Sparkle | 0 | 0
206 | Fluttershy | 0 | 0
207 | Applejack | 0 | 0
208 | Pinkie Pie | 0 | 0
209 | Rarity | 0 | 0
210 | Rainbow Dash | 0 | 0
211 | Princess Celestia | 0 | 0
212 | Princess Luna | 0 | 0
My Job is here is Returns a list of pairs of players for the next round of a match.
Assuming that there are an even number of players registered, each player
appears exactly once in the pairings. Each player is paired with another
player with an equal or nearly-equal win record, that is, a player adjacent to him or her in the standings.
Returns:
A list of tuples, each of which contains (id1, name1, id2, name2)
id1: the first player's unique id
name1: the first player's name
id2: the second player's unique id
name2: the second player's name
to achieve those goals i have done self joined that table and have writen code something like this
SELECT a.id, a.name, b.id, b.name
FROM results AS a, results AS b
WHERE a.id > b.id and a.wins = b.wins
LIMIT COUNT(a.id)/2;
It seems not working. Please help me to dealing with this.
Thanks.
You can sequence them based on their wins then join on the sequence, so they may have the same wins or next closest:
WITH seq_results AS
(
SELECT
id,
name,
ROW_NUMBER() OVER(ORDER BY wins DESC) AS seq
FROM
results
)
SELECT
r1.id,
r1.name,
r2.id,
r2.name
FROM
seq_results r1
JOIN
seq_results r2
ON (r1.seq = (r2.seq - 1))
AND (r2.seq % 2 = 0);
Per your request, here is some information on how this works. I will highly recommend that you visit the documentation for PostgreSQL - it really is some of the best documentation out there: http://www.postgresql.org/docs/current/static/
The first part is a common-table expression (CTE). It allows me to essentially create a table in-memory for use in subsequent queries. You could just as easily create a temp table, but these don't have to be dropped, etc.
See: http://www.postgresql.org/docs/current/static/queries-with.html
WITH seq_results AS
(
SELECT
id,
name,
ROW_NUMBER() OVER(ORDER BY wins DESC) AS seq
FROM
results
)
In this CTE, I am sequencing/sequentially numbering each record using a window function. I will use these numbers later in my join. See: http://www.postgresql.org/docs/current/static/functions-window.html
SELECT
r1.id,
r1.name,
r2.id,
r2.name
FROM
seq_results r1
JOIN
seq_results r2
ON (r1.seq = (r2.seq - 1))
AND (r2.seq % 2 = 0);
Above I am joining the CTE to itself using the sequence. I "offset" the sequence of the second instance of the CTE r2 by -1, essentially joining two sequential records together.
Had I only specified that condition in the join, I would return more than the 4 records expected. I needed to make sure that the ids and names on the "left" are not also on the "right", so I decided to include only the odd-numbered sequenced records on the left and the evens on the right. To do this, I used the modulus operator % to ensure that r2 only returned records where the sequence was even.
Lastly, because the join was an inner join (JOIN is the same as INNER JOIN), any even-numbered sequences in r1 are not returned.

query that would count and increment the number of duplicate instances of that record

Using Access 2010.
So if I had a table
COL1
A
B
A
C
A
and the run the query I would get the output in COL2 where 'A' is duplicated three times and its COL2 value is in turn incremented.
COL1 | COL2
A | 1
B | 1
A | 2
C | 1
A | 3
Add a field to your table. Choose AutoNumber as its data type and make it the table's primary key. I named the field ID, so my version of your sample data looks like this ...
ID COL1
1 A
2 B
3 A
4 C
5 A
The SELECT statement below returns this result set ...
ID COL1 COL2a COL2b
1 A 1 1
2 B 1 1
3 A 2 2
4 C 1 1
5 A 3 3
COL2a and COL2b show 2 methods to achieve the same result. DCount is Access-specific, and required quotes around the m.COL1 text values. The second approach, COL2b, uses a correlated subquery so could work in a different database if you choose. And with that approach, you wouldn't need to bother about quoting text values.
Either approach basically requires the db engine run an additional query for each row of the result set. So, with a huge table, performance will be a concern. Indexing will help there. Add an index on COL1 if there isn't one already. ID already has an index since it's the primary key.
If you can't add a field, and the table doesn't already include another suitable field, then I think you're out of luck. You won't be able to get what you want with an Access query.
SELECT
m.ID,
m.COL1,
DCount(
"*",
"MyTable",
"COL1 = '" & m.COL1 & "' AND ID <= " & m.ID
) AS COL2a,
(
SELECT Count(*)
FROM MyTable AS m2
WHERE m2.COL1 = m.COL1 AND m2.ID <= m.ID
) AS COL2b
FROM MyTable2 AS m
ORDER BY m.ID;

How can I select unique rows in a database over two columns?

I have found similar solutions online but none that I've been able to apply to my specific problem.
I'm trying to "unique-ify" data from one table to another. In my original table, data looks like the following:
USERIDP1 USERIDP2 QUALIFIER DATA
1 2 TRUE AB
1 2 CD
1 3 EF
1 3 GH
The user IDs are composed of two parts, USERIDP1 and USERIDP2 concatenated. I want to transfer all the rows that correspond to a user who has QUALIFIER=TRUE in ANY row they own, but ignore users who do not have a TRUE QUALIFIER in any of their rows.
To clarify, all of User 12's rows would be transferred, but not User 13's. The output would then look like:
USERIDP1 USERIDP2 QUALIFIER DATA
1 2 TRUE AB
1 2 CD
So basically, I need to find rows with distinct user ID components (involving two unique fields) that also possess a row with QUALIFIER=TRUE and copy all and only all of those users' rows.
Although this nested query will be very slow for large tables, this could do it.
SELECT DISTINCT X.USERIDP1, X.USERIDP2, X.QUALIFIER, X.DATA
FROM YOUR_TABLE_NAME AS X
WHERE EXISTS (SELECT 1 FROM YOUR_TABLE_NAME AS Y WHERE Y.USERIDP1 = X.USERIDP1
AND Y.USERIDP2 = X.USERIDP2 AND Y.QUALIFIER = TRUE)
It could be written as an inner join with itself too:
SELECT DISTINCT X.USERIDP1, X.USERIDP2, X.QUALIFIER, X.DATA
FROM YOUR_TABLE_NAME AS X
INNER JOIN YOUR_TABLE_NAME AS Y ON Y.USERIDP1 = X.USERIDP1
AND Y.USERIDP2 = X.USERIDP2 AND Y.QUALIFIER = TRUE
For a large table, create a new auxiliary table containing only USERIDP1 and USERIDP2 columns for rows that have QUALIFIER = TRUE and then join this table with your original table using inner join similar to the second option above. Remember to create appropriate indexes.
This should do the trick - if the id fields are stored as integers then you will need to convert / cast into Varchars
SELECT 1 as id1,2 as id2,'TRUE' as qualifier,'AB' as data into #sampled
UNION ALL SELECT 1,2,NULL,'CD'
UNION ALL SELECT 1,3,NULL,'EF'
UNION ALL SELECT 1,3,NULL,'GH'
;WITH data as
(
SELECT
id1
,id2
,qualifier
,data
,SUM(CASE WHEN qualifier = 'TRUE' THEN 1 ELSE 0 END)
OVER (PARTITION BY id1 + '' + id2) as num_qualifier
from #sampled
)
SELECT
id1
,id2
,qualifier
,data
from data
where num_qualifier > 0
Select *
from yourTable
INNER JOIN (Select UserIDP1, UserIDP2 FROM yourTable WHERE Qualifier=TRUE) B
ON yourTable.UserIDP1 = B.UserIDP1 and YourTable.UserIDP2 = B.UserIDP2
How about a subquery as a where clause?
SELECT *
FROM theTable t1
WHERE CAST(t1.useridp1 AS VARCHAR) + CAST(t1.useridp2 AS VARCHAR) IN
(SELECT CAST(t2.useridp1 AS VARCHAR) + CAST(t.useridp2 AS VARCHAR)
FROM theTable t2
WHERE t2.qualified
);
This is a solution in mysql, but I believe it should transfer to sql server pretty easily. Use a subquery to pick out groups of (id1, id2) combinations with at least one True 'qualifier' row; then join that to the original table on (id1, id2).
mysql> SELECT u1.*
FROM users u1
JOIN (SELECT id1,id2
FROM users
WHERE qualifier
GROUP BY id1, id2) u2
USING(id1, id2);
+------+------+-----------+------+
| id1 | id2 | qualifier | data |
+------+------+-----------+------+
| 1 | 2 | 1 | aa |
| 1 | 2 | 0 | bb |
+------+------+-----------+------+
2 rows in set (0.00 sec)

Returning several rows from a single query, based on a value of a column

Let's say I have this table:
|Fld | Number|
1 5
2 2
And I want to make a select that retrieves as many Fld as the Number field has:
|Fld |
1
1
1
1
1
2
2
How can I achieve this? I was thinking about making a temporary table and instert data based on the Number, but I was wondering if this could be done with a single Select statement.
PS: I'm new to SQL
You can join with a numbers table:
SELECT Fld
FROM yourtable
JOIN Numbers
ON yourtable.Number <= Numbers.Number
A numbers table is just a table with a list of numbers:
Number
1
2
3
etc...
Not an great solution (since you still query your table twice, but maybe you can work from it)
SELECT t1.fld, t1.number
FROM table t1, (
SELECT ROWNUM number FROM dual
CONNECT BY LEVEL <= (SELECT MAX(number) FROM t1)) t2
WHERE t2.number<=t1.number
It generates maximum amount of rows needed and then filters it by each row.
I don't know if your RDBMS version supports it (although I rather suspect it does), but here is a recursive version:
WITH remaining (fld, times) as (SELECT fld, 1
FROM <table>
UNION ALL
SELECT a.fld, a.times + 1
FROM remaining as a
JOIN <table> as b
ON b.fld = a.fld
AND b.number > a.times)
SELECT fld
FROM remaining
ORDER BY fld
Given your source data table, it outputs this (count included for verification):
fld times
=============
1 1
1 2
1 3
1 4
1 5
2 1
2 2